CAUTION: This email originates from outside THG

Hi Forrest,

Based on the configuration specified, I will assume that 1.8TB is assignable to VM resources, less the management node (though honestly why not just share over the cluster, doesn’t seem like it’s a production workload?)

However, (1.8TB*7)*1000 = 12600GB usable ram, with no oversubscription. (roughly 180-200GB each system for host and ceph services)

Therefore 12600/3100 = ~4GB ram per VM at peak load, no oversubscription.

Based on the CPU arch:

128*2 threads = 256 vCPU cores (which in this model I would count as vCPU cores, there’s a LOT of contention about what defines a vCPU, there’s many camps and religious arguments on this)…

However, we want some space for the host so as a general thought, I will put 250 vCPU available in this calculation.

250*7 = 1750 vCPU cores avail.

There are 3100-1750 = 1,350 deficit vCPU cores.

As such to make up a 1vCPU core to VM at peak workload an oversubscription would be 1.8

1750*1.8 = 3150 vCPU available.

Honestly, depending on these workloads and your requirements you can probably bust this subscription ratio up more and again, I would instead share the management workloads over the cluster more and induct the first node as another compute resource.

I would like to know more about the workloads, what is the flavour of each VM, Is the workload disk intensive, cpu or memory, or io?

In terms of running the workload, I don’t see an obvious issue with your nova config, the allocation ratio is high but that’s not going to probably be an issue here based on the numbers I mentioned.

Ensuring host reservation memory is good, which I can see you have around 128GB in reservation and ram_allocation_ratio as 1.0 will ensure no OOM scenarios.

Overall, I think your settings are sane and my only comment would be to add that management node into the compute mix if you can but otherwise without knowing more on the workload, seems fine.

Have you done any scale testing? Spun up a workload of 3100 VMs? ( easiest way to do this would likely be to heat template…) then run an image with a default start workload of something you want to test with.

--Karl.

From: Forrest Fuqua <fffics@rit.edu>
Date: Tuesday, 31 October 2023 at 5:40 am
To: openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org>
Subject: [ops] Seeking Advice on Service Deployment and Layout for an 8-Node Cluster

Hi there,

I'm currently exploring some strategies and seeking advice on the deployment and layout of services on an 8-node cluster. Each node is equipped with dual AMD EYPC 7702s, two 100G Links, and 2TB of RAM. Additionally, I've set up two 13TB NVME drives in a Ceph configuration for each node.

At present, the setup involves one management node, while the others are dedicated to running Nova, Neutron, Glance, and Cinder. The workload, while fairly basic, involves handling substantial numbers. It's primarily used for one-off research projects and a few classes, serving around 60 end users.

The most demanding scenario occurs during cybersecurity competitions when we need to deploy 3100 VMs at once. When this is spread out over a few days, it's manageable. However, there's a risk of overloading the system, leading to DiskIO issues and Database inconsistencies.

Currently, I'm utilizing Kolla for deployment, and I've shared my configurations on this GitHub repository: Link to Configs.

I'd greatly appreciate any insights or advice on optimizing this setup to handle the peak workload more efficiently and prevent potential performance issues.

Thanks in advance for your help!