[openstack-dev] [Nova] Blueprint: standard specification of guest CPU topology

Daniel P. Berrange berrange at redhat.com
Tue Dec 3 10:20:14 UTC 2013

On Mon, Dec 02, 2013 at 11:05:02PM -0800, Vui Chiap Lam wrote:
> Hi Daniel,
> I too found the original bp a little hard to follow, so thanks for
> writing up the wiki! I see that the wiki is now linked to the BP, 
> which is great as well.
> The ability to express CPU topology constraints for the guests
> has real-world use, and several drivers, including VMware, can definitely 
> benefit from it.
> If I understand correctly, in addition to being an elaboration of the
> BP text, the wiki also adds the following:
> 1. Instead of returning the besting matching (num_sockets (S),
>    cores_per_socket (C), threads_per_core (T)) tuple,  all applicable
>    (S,C,T) tuples are returned, sorted by S then C then T.
> 2. A mandatory topology can be provided in the topology computation.
> I like 2. because there are multiple reasons why all of a hypervisor's
> CPU resources cannot be allocated to a single virtual machine. 
> Given that the mandatory (I prefer maximal) topology is probably fixed
> per hypervisor, I wonder this information should also be used in
> scheduling time to eliminate incompatible hosts outright.  

The host is exposing info about vCPU count it is able to support and the
scheduler picks on that basis. The guest image is just declaring upper
limits on topology it can support. So If the host is able to support the
guest's vCPU count, then the CPU topology decision should never cause any
boot failure As such CPU topology has no bearing on scheduling, which is
good because I think it would significantly complicate the problem.

> As for 1. because of the order of precendence of the fields in the
> (S,C,T) tuple, I am not sure how the preferred_topology comes into
> play. Is it meant to help favor alternative values of S?

> Also it might be good to describe a case where returning a list of
> (S,C,T) instead of best-match is necessary. It seems deciding what to
> pick other that the first item in the list requires logic similar to
> that used to arrive at the list in the first place.

It is really all about considering NUMA implications. If you prefer
cores and your VM ram cross a NUMA node then you sacrifice performance.
So if you know the VM RAM will have to cross a NUMA node, then you may
set a lower cores limit to force returning of topology spanning multiple
sockets. By returning a list of acceptable topologies the virt driver can
then have some flexibility in deciding how to pin guest CPUs / RAM to
host NUMA nodes, and/or expose guest visible NUMA topology

eg if the returned list gives a choice of

   (2 sockets, 2 cores, 1 thread)
   (1 socket, 4 cores, 1 thread)

then the virt driver can now chose whether to place the guest inside
1 single NUMA node, or spread it across nodes, and still expose sane
NUMA topology info to the guest. You could say we should take account
of NUMA straight away at the time we figure out the CPU topology, but
I believe that would complicate this code and make it impractical to
share the code across drivers.

If a virt driver doesn't care todo anything with the list of possible
topologies though, it can simply ignore it and always take the first
element in the list. This is what we'lll do in libvirt initially, but
we want todo intelligent automatic NUMA placement later to improve the
performance utilization of hosts.

|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

More information about the OpenStack-dev mailing list