Hi Prabhjit, Answers to the questions I can answer below. I hope you continue to work with your support contact to resolve the issues you are experiencing. Here I can only speak with my OpenStack community hat on. Michael On Fri, Sep 6, 2019 at 1:42 PM Singh, Prabhjit <Prabhjit.Singh22@t-mobile.com> wrote:
Hi Michael,
I have been trying to get Octavia LbaaS up and running and get performance tested. It has taken me some time to get quite a few things working.
While I continue to invest time in using Octavia and stay excited on some of the upcoming features. I have been asked the following questions by my leadership to which I do not have any direct answers.
1. What is the adoption of Octavia, are major organizations looking to adopt and invest in it. Can you provide some numbers
I don't have much I can share here. You can look at the OpenStack user survey information: https://www.openstack.org/analytics though some of that is still fragmented as Octavia was part of neutron in some older releases. In the 2016 and 2017 survey, "Software load balancing" was the #1 neutron feature "actively used, interested in, or planned for use." Page 53: https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c08... Page 60: https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c08... You may also find interest in which companies have contributed to the project by looking at Stackalytics: https://www.stackalytics.com/?module=octavia-group
2. Roadmap wise is the Open community committed to investing in Octavia and why
We do maintain a roadmap for longer-term goals: https://wiki.openstack.org/wiki/Octavia/Roadmap Beyond that, as OpenStack is an open community of many contributors I cannot speculate commitment.
3. Per your suggestion I tried to look up who are the primary companies using Octavia and haven't found a clear indication, any insight would be great.
That is really all I can share.
4. Would features from haproxy 2.0 be included in Octavia
Yes, it is on the roadmap. We have been waiting for 2.0.x to stabilize. The release timing of HAProxy 2.0 means that most of the major Linux distributions are not yet shipping it. This makes it a bit tricky for the OpenStack team as our testing standard is tied to these releases: https://governance.openstack.org/tc/reference/project-testing-interface.html... There is a chance that the OpenStack team will start adding features that need HAProxy 2.0.x in the Ussuri release cycle.
5. There are some open solutions from haproxy, Envoy, consul. How would Octavia compare.
There are many, many different load balancing options available. As you know Octavia supports provider drivers, so that alternate technologies can be plugged in. For the reference amphora driver (the one we use for OpenStack testing), HAProxy was selected for its stability and wide support.
6. Lastly, do you have enough encouragement to keep the project going, I guess I am looking for some motivation for continuing to choose to use Octavia when there are several turnkey solutions ( though offered at a price ).
Well, personally I plan to keep working on Octavia. I am not the project team lead for the Train or Ussuri releases, but I am still an active core member. I am a "right tool for the right job" kind of person, so it really is up to you and your needs to balance the decision of which load balancing option to select.
Currently I have been working with Redhat to answer the following questions, these are not for the community, hopefully Redhat will be able to pursue with your team.
With my OpenStack hat on and not speaking for Red Hat:
1. How to offload logs to an external log/metrics collector
This was a new feature for the Train release: https://docs.openstack.org/octavia/latest/admin/log-offloading.html
2. How to turn off logs during performance testing, I honestly do not want to do this because the performance tester is really generating live traffic which mimics a real time scenario.
https://docs.openstack.org/octavia/latest/configuration/configref.html#hapro...
3. How to set cron for rotating logs, I would think that this should be automatic. Would I need to do this everytime?
Logs are already being rotated inside the amphora.
4. Do you have any way to increase performance of the amphora, my take is haproxy can handle several thousands of concurrent connections but in our case seems like we hit a threshold at 3500 - 4500 connections and then it starts to either send resets or the connections stay open for a long time.
Yes, I have had amphora do many more connections per second than that. There is some issue in your environment that is limiting it.
Thanks & Regards
Prabhjit
-----Original Message----- From: Singh, Prabhjit Sent: Tuesday, July 23, 2019 9:45 AM To: Michael Johnson <johnsomor@gmail.com> Cc: openstack-discuss@lists.openstack.org Subject: RE: [Octavia]-Seeking performance numbers on Octavia
Thanks so much for the valuable insights Michael! Appreciate it and keep up the good work, as I ramp up with more dev know how hopefully I would start making contributions and can maybe convince my team to start as well.
Thanks & Regards
Prabhjit Singh
-----Original Message----- From: Michael Johnson <johnsomor@gmail.com> Sent: Monday, July 22, 2019 5:48 PM To: Singh, Prabhjit <Prabhjit.Singh22@T-Mobile.com> Cc: openstack-discuss@lists.openstack.org Subject: Re: [Octavia]-Seeking performance numbers on Octavia
[External]
Hi Prabhjit,
Comments in-line below.
Michael
On Sun, Jul 21, 2019 at 5:24 PM Singh, Prabhjit <Prabhjit.Singh22@t-mobile.com> wrote:
Hi Michael,
Thanks for taking the time out to send me your inputs and valuable suggestions. I do remember meeting you at the Denver Summit and hearing to a couple of your sessions. If you wouldn't mind, I do have a few more questions and your answers would help me understand that should I continue to invest in having Octavia as one of our available LBs.
1. Based on your response and the amount of time you are investing in supporting Octavia, what are some of the use cases, like for e.g. if load balancing web traffic how many transactions/connections minimum can be expected. I do understand you mentioned that it's hard to performance test Octavia but some real time situations from your testing and how customers have adopted Octavia would help me level set some expectations.
This is really cloud and application specific. I would recommend you fire up an Octavia install and use your preferred tool to measure it. Some good tools are tsung, weighttp, and iperf3.
2. We are thinking of Octavia as one of the offerings, that offers a self-serve type model. Do you know of any customers who have been able to use Octavia as one of their primary load balancers and any encouraging feedback you have gotten on Octavia.
There are examples of organizations using Octavia available if you google Octavia.
3. You suggested increasing the Ram size, I could go about making a whole new Flavor.
Yes, to increase the allocated RAM for a load balancer, you would create an additional nova flavor with the specifications you would like. You can then either set this as the default nova flavor for amphora (amp_flavor_id is the setting) or you can create an Octavia flavor that specifies the nova compute flavor to use (See https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.opens... for more information on Octavia flavors).
4. I also noticed on the haproxy.conf the maxconns is set to 2000, should I increase this, does this affect the connection per server, which you said 64000 conns per server, so if I have 10 servers can I expect somewhere close to 640000 sessions?
I think you are looking at the haproxy.conf file provided by your operating system package. Octavia does not use this file, it creates it's own HAProxy configuration files as needed under /var/lib/octavia inside the amphora. The default, if the user does not specify one at listener creation, is 1,000,000.
5. Based on some of the limitations and the dev work in progress, I think the most important feature that would make Octavia a real solid offering would be the Active-Active and Autoscaling feature. I brought this up with you in our brief conversation at the summit, and you did mention that its not a top priority at this time and you are looking for some help. I have noticed a lot of documentation has been updated on this feature, do you think with the available document and progress I could spin up a distributor and manage sessions between Amphora or it's not complete yet.
Active/Active is still on our roadmap, but unfortunately the people that were working on it had to stop for personal reasons. There may be some folks picking up this work again soon. At this point the Active/Active patches up for review are non-functional and still a work in progress.
6. We have a Triple O setup, do you think I can make the above tweaks with the Triple O setup.
I think you are able to make various adjustments to Octavia with Triple O, but I do not have specifics on that.
Thanks & Regards
Prabhjit Singh Systems Design and Strategy - Magentabox | O: (973) 397-4819 | M: (973) 563-4445
-----Original Message----- From: Michael Johnson <johnsomor@gmail.com> Sent: Friday, July 19, 2019 6:00 PM To: Singh, Prabhjit <Prabhjit.Singh22@T-Mobile.com> Cc: openstack-discuss@lists.openstack.org Subject: Re: [Octavia]-Seeking performance numbers on Octavia
[External]
Hi Prabhjit,
As you have mentioned, it is very challenging to get accurate performance results in cloud environments. There are a large number(very large in fact) of factors that can impact the overall performance of OpenStack and Octavia.
In our OpenDev testing environment, we only have software emulation virtual machines available (Qemu running with the TCG engine) which performs extremely poorly. This means that the testing environment does not reflect how the software is used in real world deployments. An example of this is simply booting a VM can take up to ten minutes on Qemu with TCG when it takes about twenty seconds on a real OpenStack deployment. With this resource limitation, we cannot effectively run performance benchmarking test jobs on the OpenDev environment.
Because of this, we don't publish performance numbers as they will not reflect what you can achieve in your environment.
Let me try to speak to your bullet points: 1. The Octavia team has never (to my knowledge) claimed the Amphora driver is "carrier grade". We do consider the Amphora driver to be "operator grade", which speaks to a cloud operator's perspective versus the previous offering that did not support high availability, have appropriate maintenance tooling, upgrade paths, performance, etc. To me, "carrier grade" has an additional level of requirements including performance, latency, scale, and availability SLAs. This is not what the Octavia Amphora driver is currently ready for. That said, third party provider drivers for Octavia may be able to provide a "carrier grade" level of load balancing for OpenStack. 2. As for performance tuning, much of this is either automatically handled by Octavia or are dependent on the application you are load balancing and your cloud deployment. For example we have many configuration settings to tune how many retries we attempt when interacting with other services. In performing and stable clouds, these can be tuned down, in others the defaults may be appropriate. If you would like faster failover, at the expense of slightly more network traffic, you can tune the health monitoring and keepalived_vrrp settings. We do not currently have a performance tuning guide for Octavia but would support someone authoring one. 3. We do not currently have a guide for this. I will say with the version of HAproxy currently being shipped with the distributions, going beyond the 1vCPU per amphora does not gain you much. With the release of HAProxy 2.0 this has changed and we expect to be adding support for vertically scaling the Amphora in future releases. Disk space is only necessary if you are storing the flow logs locally, which I would not recommend for a performance load balancer (See the notes in the log offloading guide: https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.opens...). Finally, the RAM usage is a factor of the number of concurrent connections and if you are enabling TLS on the load balancer. For typical load balancing loads, the default is typically fine. However, if you have high connection counts and/or TLS offloading, you may want to experiment with increasing the available RAM. 4. The source IP issue is a known issue (https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstoryboard...). We have not prioritized addressing this as we have not had anyone come forward that they needed this in their deployment. If this is an issue impacting your use case, please comment on the story to that effect and provide a use case. This will help the team prioritize this work. Also, patches are welcome! If you are interested in working on this issue, I can help you with information about how this could be added. It should also be noted that it is a limitation of 64,000 connections per-backend server, not per load balancer. 5. The team uses the #openstack-lbaas IRC channel on freenode and is happy to answer questions, etc.
To date, we have had limited resources (people and equipment) available to do performance evaluation and tuning. There are definitely kernel and HAProxy tuning settings we have evaluated and added to the Amphora driver, but I know there is more work that can be done. If you are interested in help us with this work, please let us know.
Michael
P.S. Here are just a few considerations that can/will impact the performance of an Octavia Amphora load balancer:
Hardware used for the compute nodes Network Interface Cards (NICs) used in the compute nodes Number of network ports enabled on the compute hosts Network switch configurations (Jumbo frames, and so on) Cloud network topology (leaf‐spine, fat‐tree, and so on) The OpenStack Neutron networking configuration (ML2 and ML3 drivers) Tenant networking configuration (VXLAN, VLANS, GRE, and so on) Colocation of applications and Octavia amphorae Over subscription of the compute and networking resources Protocols being load balanced Configuration settings used when creating the load balancer (connection limits, and so on) Version of OpenStack services (nova, neutron, and so on) Version of OpenStack Octavia Flavor of the OpenStack Octavia load balancer OS and hypervisor versions used Deployed security mitigations (Spectre, Meltdown, and so on) Customer application performance Health of the customer application
On Fri, Jul 19, 2019 at 8:52 AM Singh, Prabhjit <Prabhjit.Singh22@t-mobile.com> wrote:
Hi
I have been trying to test Octavia with some traffic generators and my tests are inconclusive. Appreciate your inputs on the following
It would be really nice to have some performance numbers that you guys have been able to achieve for this to be termed as carrier grade. Would also appreciate if you could share any inputs on performance tuning Octavia Any recommended flavor sizes for spinning up Amphorae, the default size of 1 core, 2 Gb disk and 1 Gig RAM does not seem enough. Also I noticed when the Amphorae are spun up, at one time only one master is talking to the backend servers and has one IP that its using, it has to run out of ports after 64000 TCP concurrent sessions, id there a way to add more IPs or is this the limitation If I needed some help with Octavia and some guidance around performance tuning can someone from the community help
Thanks & Regards
Prabhjit Singh