[openstack-dev] Host CPU feature exposure

Dugger, Donald D donald.d.dugger at intel.com
Thu Jan 24 00:05:58 UTC 2013

I don't know that we've bottomed out on this subject so I guess I'll just try and make some forward progress based upon the ideas in this thread.  I'll see if I can come up with at way of exposing to the scheduler the capabilities that a VM would see and then we'll go from there.

Don Dugger
"Censeo Toto nos in Kansa esse decisse." - D. Gale
Ph: 303/443-3786

-----Original Message-----
From: Dugger, Donald D 
Sent: Friday, January 11, 2013 12:43 PM
To: Daniel P. Berrange
Cc: OpenStack Development Mailing List
Subject: RE: [openstack-dev] Host CPU feature exposure

I'm trying carefully not to specify CPUID feature bits since, as you say, that would be x86 specific.  What I'm looking for is the ability to define platform capabilities and then be able to schedule an instance on a platform that supports specific capabilities.  Currently the compute nodes are only exposing CPU feature bits but I would definitely like to extend this to report other capabilities (things like the ability to directly assign SR/IOV devices come to mind although there are a lot of issues that would need to be addressed before we can truly support that).  I'm assuming that other architectures, like PPC or ARM, also have different capabilities and these capabilities should be reported to the scheduler just like in the x86 case.

I don't have a problem with the current way of exposing capabilities (a set of strings that are reported to the scheduler).  The problem I have right now is that only some features are exposed, e.g. on my machine the scheduler knows about the feature string `rdtscp' but it doesn't know about the feature string `aes' even though both features are available on this host.

I think I see 3 issues here:

1)  Exposing all feature capabilities to the scheduler.  Currently we only expose a sub-set, I'd like to see all capabilities exposed

2)  Report the capabilities for the guest VM, not the host itself.  Given the current libvirt this could get a little tricky as there's no simple API to do this.  The libvirt code in the compute node does know, from the `libvirt_cpu_mode' parameter, what type of CPU the guest will see so that code could interpret the `cpu_map.xml' file to extract all of the capabilities and report them.  Not too happy about parsing the xml file but I don't see an easier way.

3)  Supporting multiple different guest VM type.  This looks like a thornier problem that I'm willing to defer to another day.  I think there might be a use case for this, something like giving a cloud provider the ability to have tiers of service (offer a budget price for a Pentium class server with the flexibility to run that VM on a Westmere machine) but I think that's more a second order effect we can look at later.

Don Dugger
"Censeo Toto nos in Kansa esse decisse." - D. Gale
Ph: 303/443-3786

-----Original Message-----
From: Daniel P. Berrange [mailto:berrange at redhat.com] 
Sent: Thursday, January 10, 2013 10:53 AM
To: Dugger, Donald D
Cc: OpenStack Development Mailing List
Subject: Re: [openstack-dev] Host CPU feature exposure

On Thu, Jan 10, 2013 at 04:21:15PM +0000, Dugger, Donald D wrote:
> Well, the problem I'm trying to address is how to expose host features so that
> the scheduler can make decisions on those features.  A specific problem, for
> example, is how to create a special flavor that will start an instance on a
> machine that has the new Advanced Encryption Standard (`aes') instructions.
> I can create an ImagePropertiesFilter that specifies `aes' as a required
> feature for the image but the scheduler won't know which hosts are appropriate
> because the scheduler only knows that the host is a Westmere, not that `aes'
> is part of a Westmere system.
> What I'd like to see is all of the system features explicitly listed inside the
> scheduler.  Providing convenient short hands (model Westmere means `sse' and
> `sse2' and `aes' and ...) is fine but you also need to know exactly what is
> available.
> Note that there is no intent to be x86 specific, I would expect the same
> capability on a PPC or ARM system, just the specific names would change.

If you're making the scheduler apply logic in terms of CPU feature flags,
then you are definitely x86 specific, because there is no such concept
on other architectures.

> 1)  Host capabilities vs. guest VM capabilities.  Currently compute nodes
> send the `host` capabilities to the scheduler.  Although useful the
> capabilities for the `guest VM` is probably more important.  I'm not
> that familiar with libvirt but is it even possible to get a full set of
> guest features, I've looked at the output from `virConnectGetCapabilities'
> and the guest features don't seem to be listed.

Again we intentionally don't expose this information because applying
logic based on CPU feature flags is fundamentally non-portable

We provide an API which allows you to pass in a CPU description (where
a CPU description == a CPU model name + a list of features), and returns
status on whether the host can support that CPU description. This keeps
mgmt applications of the business of doing architecture specific CPU

> 2)  Guest VM type.  Currently, the type of guest to be created can be
> specified by the `libvirt_cpu_model' parameter in the `nova.conf' file.
> This means that a host will only support one guest type.  It would be
> more flexible to be able to specify the model type at run time.  A
> Westmere host can start either Westmere, Nehalem, Penryn or other
> guests, why restrict that host to just one guest type.

The only place where filtering based on CPU model takes place is in
the migration code, when it is trying to find a target host that is
compatible with what the guest is currently running on. Even before
the 'libvirt_cpu_model' parameter was introduced, this migration
code was doing an overly aggressive exact match on CPU models. This
clearly needs changing. The migration code is even more sucky because
when picking the target host, it picks the host, then invokes the
'compare_cpu' function. If that fails, it picks another host and
re-tries, and again, and again.

IMHO the interaction between schedular and hypervisor hosts wrt to
CPU model is flawed, not least because of the problems described
above, but also because the CPU information provoided is not
standardized across Nova hypervisor drivers at all.

I think that Nova needs to have a formal concept of CPU types, with
arbitrary names it decides upon, eg it might allow a list of CPU

  "Any Host"
  "Any Intel"
  "Any AMD"
  "Any AES"

Each hypervisor (libvirt, xen, hyper-v, esx, etc) would decide how
these CPU types map to their particular way of configuring CPUs
(libvirt would map them to CPU model + feature list, Xen would map
them to a CPUID string, VMWare would do whatever it does).

These CPU types would be asociated with instance flavours. The hypervisor
hosts would simply report which of the CPU types they are able to support
The schedular can then trivially to host selection based on CPU types and
not need to know about CPU model names or feature flags.

The only place needing to know about CPU model names / features flags is
the place in the virt driver where you do the mapping of CPU types to
the virt driver specific config format. This could be made admin customizable
so people deploying Nova can provide further CPU types as they see fit.

|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

More information about the OpenStack-dev mailing list