[octavia] usage of SELECT .. FOR UPDATE
Hi everyone, We're being particularly hit hard across different deployments where Octavia has several SELECT .. FOR UPDATE queries which are causing load balancers to fail to provision properly. - spare_pools: This usually hits on rolling restarts of o-housekeeping as they all seem to try to capture a lock -- https://github.com/openstack/octavia/blob/73fbc05386b512aa1dd86a0ed6e8455cc6... - quota: This hits when provisioning a lot of load balancers in parallel. For example in cases when using Heat -- https://github.com/openstack/octavia/blob/bf3d5372b9fc670ecd08339fa989c9b738... These hurt quite a lot in a busy deployment and result in a poor user experience unfortunately. We're trying to off-load Octavia to it's own database server but that is more of a "throw power at the problem" solution. I can imagine that we can probably likely look into a better/cleaner alternative that avoids this entirely? I'm happy to try and push for some of this work on our side. Thanks, Mohammed -- Mohammed Naser VEXXHOST, Inc.
Hi Mohammed, Have you opened stories for these issues? I haven't seen any bug reports about this. If not, could you capture your information in stories for us to work against? I am not sure I follow the issue fully, so hopefully we can clarify. Housekeeping, when spares pool is enabled, boot spare amphora VMs. I'm not sure how that could inhibit load balancers from provisioning. Sure, some periodic jobs in the housekeeping process may deadlock and not complete booting spare VMs, but this will not block any load balancer provisioning. If there are no spares available the worker will simply boot a VM as it would normally do without spares enabled (This was functionality we added to Taskflow from the beginning to make sure we didn't have issues blocking load balancers from provisioning if the spares pool was depleted). This lock was added at an operators request as they did not want any "extra" amphora booted beyond the configured spares pool limit. The quota management does lock the project during the critical phase of managing the quota for the project, just like every OpenStack project. If that is not completing the quota update in a timely manner, please open a story with the logs so we can investigate. I assume your application is correctly designed to handle an asynchronous API (such as neutron, Octavia, etc.) and handle any responses that indicate the object is currently immutable and will retry the request. Michael On Sun, Aug 30, 2020 at 8:47 AM Mohammed Naser <mnaser@vexxhost.com> wrote:
Hi everyone,
We're being particularly hit hard across different deployments where Octavia has several SELECT .. FOR UPDATE queries which are causing load balancers to fail to provision properly.
- spare_pools: This usually hits on rolling restarts of o-housekeeping as they all seem to try to capture a lock -- https://github.com/openstack/octavia/blob/73fbc05386b512aa1dd86a0ed6e8455cc6...
- quota: This hits when provisioning a lot of load balancers in parallel. For example in cases when using Heat -- https://github.com/openstack/octavia/blob/bf3d5372b9fc670ecd08339fa989c9b738...
These hurt quite a lot in a busy deployment and result in a poor user experience unfortunately. We're trying to off-load Octavia to it's own database server but that is more of a "throw power at the problem" solution. I can imagine that we can probably likely look into a better/cleaner alternative that avoids this entirely?
I'm happy to try and push for some of this work on our side.
Thanks, Mohammed
-- Mohammed Naser VEXXHOST, Inc.
participants (2)
-
Michael Johnson
-
Mohammed Naser