[openstack-dev] [Neutron][LBaaS] HA functionality discussion

Carlos Garza carlos.garza at rackspace.com
Fri Apr 18 01:11:54 UTC 2014


On Apr 17, 2014, at 5:49 PM, Stephen Balukoff <sbalukoff at bluebox.net<mailto:sbalukoff at bluebox.net>>
 wrote:

Heyas, y'all!

So, given both the prioritization and usage info on HA functionality for Neutron LBaaS here:  https://docs.google.com/spreadsheet/ccc?key=0Ar1FuMFYRhgadDVXZ25NM2NfbGtLTkR0TDFNUWJQUWc&usp=sharing

It's clear that:

A. HA seems to be a top priority for most operators
B. Almost all load balancer functionality deployed is done so in an Active/Standby HA configuration

I know there's been some round-about discussion about this on the list in the past (which usually got stymied in "implementation details" disagreements), but it seems to me that with so many players putting a high priority on HA functionality, this is something we need to discuss and address.

This is also apropos, as we're talking about doing a major revision of the API, and it probably makes sense to seriously consider if or how HA-related stuff should make it into the API. I'm of the opinion that almost all the HA stuff should be hidden from the user/tenant, but that the admin/operator at the very least is going to need to have some visibility into HA-related functionality. The hope here is to discover what things make sense to have as a "least common denominator" and what will have to be hidden behind a driver-specific implementation.

I certainly have a pretty good idea how HA stuff works at our organization, but I have almost no visibility into how this is done elsewhere, leastwise not enough detail to know what makes sense to write API controls for.

So! Since gathering data about actual usage seems to have worked pretty well before, I'd like to try that again. Yes, I'm going to be asking about implementation details, but this is with the hope of discovering any "least common denominator" factors which make sense to build API around.

For the purposes of this document, when I say "load balancer devices" I mean either physical or virtual appliances, or software executing on a host somewhere that actually does the load balancing. It need not directly correspond with anything physical... but probably does. :P

And... all of these questions are meant to be interpreted from the perspective of the cloud operator.

Here's what I'm looking to learn from those of you who are allowed to share this data:

1. Are your load balancer devices shared between customers / tenants, not shared, or some of both?
     If by shared you mean the ability to add and delete loadbalancer Our loadbalancers are not shared by different customers which we call accounts. If your referring to networking then yes they are on the same clan. Our clusters are basically a physical grouping of 4 or 5 stingray devices that share IPs on the vip side. The configs are created on all stingray nodes in a cluster. If a stingray loadbalancer goes down all its vips will be taken over by one of the other 4 or 5 machines. We achieve HA by moving noisy customers IPs to another stingray node. The machine taking over an ip will send a gratuitous ARP response for the router to train its arp table on.  Usually we have 2 stingray nodes available for fail over. We could have spread the load across all boxes evenly but we felt that if we were near the end of the capacity for a given cluster if one of the nodes tanked this would have degraded performance as the other nodes were already nearing capacity.

    We also have the usual dual switch dual router set up incase one dies config.

1a. If shared, what is your strategy to avoid or deal with collisions of customer rfc1918 address space on back-end networks? (For example, I know of no load balancer device that can balance traffic for both customer A and customer B if both are using the 10.0.0.0/24<http://10.0.0.0/24> subnet for their back-end networks containing the nodes to be balanced, unless an extra layer of NATing is happening somewhere.)

    We order a set of CIDR blocks from our backbone and route them to our Cluster via a 10Gig/s link which In our bigger clusters can be upgraded via link bonding.
downstream we have two routes to one route for our own internal ServiceNet 10.0.0.0/8 space and the public Internet for everything not on our service net. Our pool members are specified by CIDR block only with no association to a layer 2 network. When customers create their cloud servers they will be assigned an IP with in the address space of 10.0.0.0/24 and also get a publicly routable IP address. At that point the customer cane achieve isolation via IP tables or what ever tools their VM supports. In theory a user could mistaking punch in an IP address in a node that doesn't belong to them but that just means the lb will route to only one machine but the loadbalancer would be useless at that point. We don't charge our users for bandwidth going across service net since each DC has its own service net and our customers want to have the LoadBalancer close to their servers anyways. If they want to host back end servers on say Amazon hostgater or what ever then then the loadbalancer will unfortunately route over the public interet for those. I'm not sure why customers would want to do this but we were flexible enough to support it. In short HA is achieved through shared ips between our Stingray Nodes.
We have 2 fail over nodes that basically do nothing on standby just in case an active node suddenly dies. So I guess you could call this HA n+1. We also divide the cluster into two cabinets with a failover in each one heaven forbid a whole cabinet should sudden fail. We've never seen this happen Knock on wood.


2. What kinds of metrics do you use in determining load balancing capacity?

    so far we've been measuring bandwidth for the most part as it usually caps out before CPU does. Out newest Stingray nodes have 24 cores. We of course gather metrics for IP space left (So we can order more ahead of time) We have noticed that we are limited to horizontal scaling of around 6 stingray nodes. CPU load goes up after 6 nodes which we have determined to be due to the rapidly changing configs must be synced across all the stingray nodes, stingray has a pretty nasty flaw in how it sends its configs to its other stingray nodes in cluster.

    In the case of SSL if an SSL user uses SSL in mixed mode (meaning http and https not sure why'ed they do) we actually set up two virtual servers transparent to the customers so we track the ssl bandwidth separately but using the same SNMP call.

3. Do you operate with a pool of unused load balancer device capacity (which a cloud OS would need to keep track of), or do you spin up new capacity (in the form of virtual servers, presumably) on the fly?

    Kind of answered in question 1. This doesn't apply to much to us as we use physical loadbalancers behind our API. For CLB2.0 we would like to see how we would achieve the same level of HA in the virtual space world.

3a. If you're operating with a availability pool, can you describe how new load balancer devices are added to your availability pool?  Specifically, are there any steps in the process that must be manually performed (ie. so no API could help with this)?

    The API could help with some aspects of this. For example we have and are advocating a separate management API thats separate from the public one that can do things like tell the provisioner(What your calling a scheduler) when new capacity is available how to route to it and store this in the database for the public API to use in determing how to allocate resources. Our management API in particular is used to add IPv4 address space to our database once backbone routes them to us. So its like our current process involves the classic Hey Back bone I'd like to order a new /22 were running low on ips. Then the management interface could then be called to add the CIDR block so that it can track the ips in its database.


4. How are new devices 'registered' with the cloud OS? How are they removed or replaced?

5. What kind of visibility do you (or would you) allow your user base to see into the HA-related aspects of your load balancing services?
    We don't. We view HA in terms of redundant hardware and floating IPs and being that end users don't control those its not visible. We do state our 4 nines uptime which hasten't been broken as well as compensation for violations of our end of the SLA agreement.

http://www.rackspace.com/information/legal/cloud/sla
https://status.rackspace.com/

6. What kind of functionality and visibility do you need into the operations of your load balancer devices in order to maintain your services, troubleshoot, etc.? Specifically, are you managing the infrastructure outside the purview of the cloud OS? Are there certain aspects which would be easier to manage if done within the purview of the cloud OS?

     We wrote SNMP tools to monitor the Stingray nodes which our executed by  our Api nodes. Stingray offers a rich oid MIB that allows us to track pretty much anything. But we only look at Bandwidth In, Bandwidth out,  and the number of concurrent connections. I'm considering adding CPU statistics now actually.


7. What kind of network topology is used when deploying load balancing functionality? (ie. do your load balancer devices live inside or outside customer firewalls, directly on tenant networks? Are you using layer-3 routing? etc.)

Just pure layer3. This limitation has left us wanting a private networking solution and during that investigation we arrived here in Neutron/Lbaas.



8. Is there any other data you can share which would be useful in considering features of the API that only cloud operators would be able to perform?

   Shared and failoverable IPs is desired. but much of the HA stuff will come from the driver/provider. I just think floating IPs just needs to be supported by the api or at least query able to see if it supports floating ips.

And since we're one of these operators, here are my responses:

1. We have both shared load balancer devices and private load balancer devices.

1a. Our shared load balancers live outside customer firewalls, and we use IPv6 to reach individual servers behind the firewalls "directly." We have followed a careful deployment strategy across all our networks so that IPv6 addresses between tenants do not overlap.

yea us too. We hash the tenant_id into 32bits and use it in bits 64-96 leaving the customer with 32 bits to play for their hosts. If they need more then 4 billion then we have bigger problems. Our cluster is a /48 so were wasting 16 bits on nothing in the middle.
Cluster, Tenant id, host_id
CCCC:CCCC:CCCC:0000:TTTT:TTTT:HHHH:HHHH



2. The most useful ones for us are "number of appliances deployed" and "number and type of load balancing services deployed" though we also pay attention to:
* Load average per "active" appliance
* Per appliance number and type of load balancing services deployed
* Per appliance bandwidth consumption
* Per appliance connections / sec
* Per appliance SSL connections / sec

Since our devices are software appliances running on linux we also track OS-level metrics as well, though these aren't used directly in the load balancing features in our cloud OS.

3. We operate with an availability pool that our current cloud OS pays attention to.

3a. Since the devices we use correspond to physical hardware this must of course be rack-and-stacked by a datacenter technician, who also does initial configuration of these devices.

4. All of our load balancers are deployed in an active / standby configuration. Two machines which make up an active / standby pair are registered with the cloud OS as a single unit that we call a "load balancer cluster." Our availability pool consists of a whole bunch of these load balancer clusters. (The devices themselves are registered individually at the time the cluster object is created in our database.) There are a couple manual steps in this process (currently handled by the datacenter techs who do the racking and stacking), but these could be automated via API. In fact, as we move to virtual appliances with these, we expect the entire process to become automated via API (first cluster primitive is created, and then "load balancer device objects" get attached to it, then the cluster gets added to our availability pool.)

Removal of a "cluster" object is handled by first evacuating any customer services off the cluster, then destroying the load balancer device objects, then the cluster object. Replacement of a single load balancer device entails removing the dead device, adding the new one, synchronizing configuration data to it, and starting services.

5. At the present time, all our load balancing services are deployed in an active / standby HA configuration, so the user has no choice or visibility into any HA details. As we move to Neutron LBaaS, we would like to give users the option of deploying non-HA load balancing capacity. Therefore, the only visibility we want the user to get is:

* Choose whether a given load balancing service should be deployed in an HA configuration ("flavor" functionality could handle this)
* See whether a running load balancing service is deployed in an HA configuration (and see the "hint" for which physical or virtual device(s) it's deployed on)
* Give a "hint" as to which device(s) a new load balancing service should be deployed on (ie. for customers looking to deploy a bunch of test / QA / etc. environments on the same device(s) to reduce costs).

Note that the "hint" above corresponds to the "load balancing cluster" alluded to above, not necessarily any specific physical or virtual device. This means we retain the ability to switch out the underlying hardware powering a given service at any time.

Users may also see usage data, of course, but that's more of a generic stats / billing function (which doesn't have to do with HA at all, really).

6. We need to see the status of all our load balancing devices, including availability, current role (active or standby), and all the metrics listed under 2 above. Some of this data is used for creating trend graphs and business metrics, so being able to query the current metrics at any time via API is important. It would also be very handy to query specific device info (like revision of software on it, etc.) Our current cloud OS does all this for us, and having Neutron LBaaS provide visibility into all of this as well would be ideal. We do almost no management of our load balancing services outside the purview of our current cloud OS.

7. Shared load balancers must live outside customer firewalls, private load balancers typically live within customer firewalls (sometimes in a DMZ). In any case, we use layer-3 routing (distributed using routing protocols on our core networking gear and static routes on customer firewalls) to route requests for "service IPs" to the "highly available routing IPs" which live on the load balancers themselves. (When a fail-over happens, at a low level, what's really going on is the "highly available routing IPs" shift from the active to standby load balancer.)

We have contemplated using layer-2 topology (ie. directly connected on the same vlan / broadcast domain) and are building a version of our appliance which can operate in this way, potentially reducing the reliance on layer-3 routes (and making things more friendly for the OpenStack environment, which we understand probably isn't ready for layer-3 routing just yet).

8. I wrote this survey, so none come to mind for me. :)

Stephen

--
Stephen Balukoff
Blue Box Group, LLC
(800)613-4305 x807
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org<mailto:OpenStack-dev at lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140418/f1d24ef3/attachment.html>


More information about the OpenStack-dev mailing list