Just an update on this. I have found a 'fix' that is more like a workaround in that I still don't know what causes the problem.

So, to start with, all VMs are being spawned on a single node and while I can live migrate to another node, the scheduler never sends VMs there.

To fix this, I disable the hypervisor on the node that is getting all the VMs, then I spawn a bunch of VMs. They go to the node that still has its hypervisor enabled. Then I re-enable the disabled hypervisor and spawn a bunch of VMs. Now they get evenly split across the two nodes.

Fixed.

--
MC