Hi there,
I'm currently exploring some strategies and seeking advice on the deployment and layout of services on an 8-node cluster. Each node is equipped with dual AMD EYPC 7702s, two 100G Links, and 2TB of RAM. Additionally, I've set up two 13TB NVME drives in a Ceph configuration for each node.
At present, the setup involves one management node, while the others are dedicated to running Nova, Neutron, Glance, and Cinder. The workload, while fairly basic, involves handling substantial numbers. It's primarily used for one-off research projects and a few classes, serving around 60 end users.
The most demanding scenario occurs during cybersecurity competitions when we need to deploy 3100 VMs at once. When this is spread out over a few days, it's manageable. However, there's a risk of overloading the system, leading to DiskIO issues and Database inconsistencies.
Currently, I'm utilizing Kolla for deployment, and I've shared my configurations on this GitHub repository: Link to Configs.
I'd greatly appreciate any insights or advice on optimizing this setup to handle the peak workload more efficiently and prevent potential performance issues.
Thanks in advance for your help!