Hi,
In our environment (Kolla-Ansible Ussuri, 1200+ hypervisors), we are experiencing performance issues during bulk VM deployments. When launching around 100 VMs, the operation takes approximately 15 minutes. During this time, the nova-scheduler spends over 15 minutes calculating available pinned CPUs.
However, in our Yoga production deployment, the same operation is about 10 times faster — around 1–2 minutes for 100 VMs.
Additionally, after the 100 VMs are created, we have observed that some instances could not be created and some of the rest receive multiple (2–3) IP addresses from Neutron instead of one.
Our nova services are running in a NUMA-aware, CPU-pinning configuration, and other settings are almost identical to the default configuration.
Below is an excerpt from the nova-scheduler logs during VM creation (this message is repeated thousands of times within 15 minutes):
Do you have any suggestions on what might be causing:
The scheduler to spend so much time calculating pinned CPUs in Yoga, and
Some VMs to get multiple IPs from Neutron after creation?
Any guidance would be greatly appreciated.
Best regards,
İzzettin