Open Stack

Mon Jul 22 00:24:14 UTC 2019

Hi Michael,

Thanks for taking the time out to send me your inputs and valuable suggestions. I do remember meeting you at the Denver Summit and hearing to a couple of your sessions.
If you wouldn't mind, I do have a few more questions and your answers would help me understand that should I continue to invest in having Octavia as one of our available LBs.

1. Based on your response and the amount of time you are investing in supporting Octavia, what are some of the use cases, like for e.g. if load balancing web traffic how many transactions/connections minimum can be expected. I do understand 
you mentioned that it's hard to performance test Octavia but some real time situations from your testing and how customers have adopted Octavia would help me level set some expectations.

2. We are thinking of Octavia as one of the offerings, that offers a self-serve type model. Do you know of any customers who have been able to use Octavia as one of their primary load balancers and any encouraging feedback you have gotten on Octavia. 

3.  You suggested increasing the Ram size, I could go about making a whole new Flavor.

4. I also noticed on the haproxy.conf the maxconns is set to 2000, should I increase this, does this affect the connection per server, which you said 64000 conns per server, so if I have 10 servers can I expect somewhere close to 640000 sessions? 

5. Based on some of the limitations and the dev work in progress, I think the most important feature that would make Octavia a real solid offering would be the Active-Active and Autoscaling feature. I brought this up with you in our brief conversation at the summit, and you did mention that its not a top priority at this time and you are looking for some help. I have noticed a lot of documentation has been updated on this feature, do you think with the available document and progress I could spin up a distributor and manage sessions between Amphora or it's not complete yet.

6. We have a Triple O setup, do you think I can make the above tweaks with the Triple O setup.

Thanks & Regards

Prabhjit Singh 
Systems Design and Strategy - Magentabox
| O: (973) 397-4819 | M: (973) 563-4445

-----Original Message-----
From: Michael Johnson <johnsomor at gmail.com> 
Sent: Friday, July 19, 2019 6:00 PM
To: Singh, Prabhjit <Prabhjit.Singh22 at T-Mobile.com>
Cc: openstack-discuss at lists.openstack.org
Subject: Re: [Octavia]-Seeking performance numbers on Octavia

[External]

Hi Prabhjit,

As you have mentioned, it is very challenging to get accurate performance results in cloud environments. There are a large number(very large in fact) of factors that can impact the overall performance of OpenStack and Octavia.

In our OpenDev testing environment, we only have software emulation virtual machines available (Qemu running with the TCG engine) which performs extremely poorly. This means that the testing environment does not reflect how the software is used in real world deployments.
An example of this is simply booting a VM can take up to ten minutes on Qemu with TCG when it takes about twenty seconds on a real OpenStack deployment.
With this resource limitation, we cannot effectively run performance benchmarking test jobs on the OpenDev environment.

Because of this, we don't publish performance numbers as they will not reflect what you can achieve in your environment.

Let me try to speak to your bullet points:
1. The Octavia team has never (to my knowledge) claimed the Amphora driver is "carrier grade". We do consider the Amphora driver to be "operator grade", which speaks to a cloud operator's perspective versus the previous offering that did not support high availability, have appropriate maintenance tooling, upgrade paths, performance, etc.
To me, "carrier grade" has an additional level of requirements including performance, latency, scale, and availability SLAs. This is not what the Octavia Amphora driver is currently ready for. That said, third party provider drivers for Octavia may be able to provide a "carrier grade" level of load balancing for OpenStack.
2. As for performance tuning, much of this is either automatically handled by Octavia or are dependent on the application you are load balancing and your cloud deployment. For example we have many configuration settings to tune how many retries we attempt when interacting with other services. In performing and stable clouds, these can be tuned down, in others the defaults may be appropriate. If you would like faster failover, at the expense of slightly more network traffic, you can tune the health monitoring and keepalived_vrrp settings. We do not currently have a performance tuning guide for Octavia but would support someone authoring one.
3. We do not currently have a guide for this. I will say with the version of HAproxy currently being shipped with the distributions, going beyond the 1vCPU per amphora does not gain you much. With the release of HAProxy 2.0 this has changed and we expect to be adding support for vertically scaling the Amphora in future releases. Disk space is only necessary if you are storing the flow logs locally, which I would not recommend for a performance load balancer (See the notes in the log offloading guide:
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.openstack.org%2Foctavia%2Flatest%2Fadmin%2Flog-offloading.html&data=02%7C01%7CPrabhjit.Singh22%40t-mobile.com%7C8c8032b0852446142af508d70c946ba5%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C636991703917513373&sdata=rUZSLn1dWt6iHqldGsqDv6DIdUysx3H0ogJYXyQRw7o%3D&reserved=0).
Finally, the RAM usage is a factor of the number of concurrent connections and if you are enabling TLS on the load balancer. For typical load balancing loads, the default is typically fine. However, if you have high connection counts and/or TLS offloading, you may want to experiment with increasing the available RAM.
4. The source IP issue is a known issue
(https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstoryboard.openstack.org%2F%23!%2Fstory%2F1629066&data=02%7C01%7CPrabhjit.Singh22%40t-mobile.com%7C8c8032b0852446142af508d70c946ba5%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C636991703917523364&sdata=cIoDC1Ncd5uCy2as8NErGrJ0EO0IO64dZEob0XkLmyc%3D&reserved=0). We have not prioritized addressing this as we have not had anyone come forward that they needed this in their deployment. If this is an issue impacting your use case, please comment on the story to that effect and provide a use case. This will help the team prioritize this work.
Also, patches are welcome! If you are interested in working on this issue, I can help you with information about how this could be added.
It should also be noted that it is a limitation of 64,000 connections per-backend server, not per load balancer.
5. The team uses the #openstack-lbaas IRC channel on freenode and is happy to answer questions, etc.

To date, we have had limited resources (people and equipment) available to do performance evaluation and tuning. There are definitely kernel and HAProxy tuning settings we have evaluated and added to the Amphora driver, but I know there is more work that can be done. If you are interested in help us with this work, please let us know.

Michael

P.S. Here are just a few considerations that can/will impact the performance of an Octavia Amphora load balancer:

Hardware used for the compute nodes
Network Interface Cards (NICs) used in the compute nodes Number of network ports enabled on the compute hosts Network switch configurations (Jumbo frames, and so on) Cloud network topology (leaf‐spine, fat‐tree, and so on) The OpenStack Neutron networking configuration (ML2 and ML3 drivers) Tenant networking configuration (VXLAN, VLANS, GRE, and so on) Colocation of applications and Octavia amphorae Over subscription of the compute and networking resources Protocols being load balanced Configuration settings used when creating the load balancer (connection limits, and so on) Version of OpenStack services (nova, neutron, and so on) Version of OpenStack Octavia Flavor of the OpenStack Octavia load balancer OS and hypervisor versions used Deployed security mitigations (Spectre, Meltdown, and so on) Customer application performance Health of the customer application

On Fri, Jul 19, 2019 at 8:52 AM Singh, Prabhjit <Prabhjit.Singh22 at t-mobile.com> wrote:
>
> Hi
>
>
>
> I have been trying to test Octavia with some traffic generators and my 
> tests are inconclusive. Appreciate your inputs on the following
>
>
>
> It would be really nice to have some performance numbers that you guys have been able to achieve for this to be termed as carrier grade.
> Would also appreciate if you could share any inputs on performance 
> tuning Octavia Any recommended flavor sizes for spinning up Amphorae, the default size of 1 core, 2 Gb disk and 1 Gig RAM does not seem enough.
> Also I noticed when the Amphorae are spun up, at one time only one 
> master is talking to the backend servers and has one IP that its 
> using, it has to run out of ports after 64000 TCP concurrent sessions, 
> id there a way to add more IPs or is this the limitation If I needed 
> some help with Octavia and some guidance around performance tuning can 
> someone from the community help
>
>
>
> Thanks & Regards
>
>
>
> Prabhjit Singh
>
>
>
>
>
>

Open Stack

[Octavia]-Seeking performance numbers on Octavia

OpenStack

Community

Documentation

Branding & Legal