[largescale-sig]scaling story
Hi everyone,
we want to share our scaling story.
Please refer the link: https://etherpad.opendev.org/p/large-scale-inspur
Thanks,
Alex Song
Hi Alex,
Thanks for the great write up! I would love to see more of this.
How did you adjust the max number of conns for RabbitMQ and for the relay I assume you used https://docs.ovn.org/en/latest/tutorials/ovn-ovsdb-relay.html ?
Thanks Mohammed
From: Alex Song (宋文平) songwenping@inspur.com Date: Wednesday, June 12, 2024 at 4:43 AM To: openstack-discuss@lists.openstack.org openstack-discuss@lists.openstack.org Subject: [largescale-sig]scaling story Hi everyone,
we want to share our scaling story. Please refer the link: https://etherpad.opendev.org/p/large-scale-inspur Thanks,
Alex Song
That sounds a great value for large scale! Thank you for this story. Would you mind sharing more details / config params / change you did on code?
I am surprised about the max_connections = 100000 for example. We identified on our side that having too much connection to the DB resulted in memory/fd exhaustion.
Also, one question about the placement 1000 host limit, is it because you request to spawn 3k instances in one request?
Cheers,
Arnaud
On 13.06.24 - 11:47, Mohammed Naser wrote:
Hi Alex,
Thanks for the great write up! I would love to see more of this.
How did you adjust the max number of conns for RabbitMQ and for the relay I assume you used https://docs.ovn.org/en/latest/tutorials/ovn-ovsdb-relay.html ?
Thanks Mohammed
From: Alex Song (宋文平) songwenping@inspur.com Date: Wednesday, June 12, 2024 at 4:43 AM To: openstack-discuss@lists.openstack.org openstack-discuss@lists.openstack.org Subject: [largescale-sig]scaling story Hi everyone,
we want to share our scaling story. Please refer the link: https://etherpad.opendev.org/p/large-scale-inspur Thanks,
Alex Song
Hi Arnaud, For the first question, in the operating system, maintaining a link typically takes up 10-100KB memory space, 100000 connections may use 1G-10G memory space, the memory will not be exhausted. During our 3000 nodes large scale test, we calculated that the maximum number of database connections reached around 1.5W, and in the stable state, the number of connections remained around 4500. thus the connection links comsume less than 1GB memory space, includes the size of query cache, temporary tables, or buffer pools, we calculated that the maximum memory usage of the database service process is around 25GB, which is less than 10% of the total memory usage of the control node. Therefore, it will not cause memory depletion. For the second question, yeah, it's because we spwan to create 3k instance in one request, the placement default return 1000 allocate candidates, so we need to increase the limit to 3000. For the deployment optimization, we modified the ansible module on the basis of the openstack helm to support user-defined configuration, making it more convenient to modify openstack configuration parameters. Additonally, we optimize the load balance problem of Kubeapi in large-scale scenarios, adjust the long connection strategy of Kubelet client to make it randomly reconnect and ensure the overall load balance of all management nodes. The etherpad https://etherpad.opendev.org/p/large-scale-inspur shows the more details / config params/ change on code.
participants (4)
-
Alex Song (宋文平)
-
Arnaud Morin
-
Mohammed Naser
-
songwenping@inspur.com