[openstack-dev] [Nova] PCI flavor objects - please review this proposal

Ian Wells ijw.ubuntu at cack.org.uk
Tue Jan 21 16:59:52 UTC 2014


Hi,

We've had a few comments about creating a new table specifically for PCI
flavors versus using the already existing host aggregates table, and John
Garbutt asked me to explain the concepts involved here to see what the body
of opinion was on the subject.  My opinion, which which this account is
biased towards, is that the PCI flavor serves a specific purpose and host
aggregates are not terribly useful for any of the concepts here, but I
would like to ensure that whatever we put forward is going to be accepted
at review, so please have a read and see if you agree.  Apoogies for the
essay, but I'm trying to summarise from start to finish; I hope the
explanation makes sense if you stick with it.


The current situation - the PCI code in Havana - has one concept, a 'PCI
whitelist', which - on each compute host - is a config item that describes
the available PCI devices that VMs can use, by matching the vendor and
product IDs on the card (i.e. what sort of card it is).  The Nova flavor
then has an additional extra_specs item that is a set of matching
expressions and counts.  Free matching PCI devices will meet the
requirements of the Nova flavor; the VM will be scheduled to wherever we
can find a full set of devices.

This has advantages and issues.

I personally like the fact that the PCI whitelist lives on the compute node
- the hardware in the compute node is something very specific to that
compute node and doesn't change much, and I think that it's the right
approach to store it as config, therefore, rather than in the DB.  If a new
compute node is added, its config explains what PCI devices are available,
regardless of whether it's the same as, or different to, other nodes in the
system.  Speaking personally, managing these often repetitive configs
usually comes down to writing a bit of puppet, so while I do occasionally
get them wrong and have to fix them, it's not a massive overhead to roll
out new machines.

The biggest limitation is there are certain things with this scheme you
can't represent.  Sometimes you want to put your devices into Nova flavors,
but sometimes you want to use them in other ways.  For instance, I'd like
to do:

    nova boot --nic pci-flavor=10g,net-id=XXX ...

... where I'm referring to the PCI type directly, not in a Nova flavor.

For this purpose we came up with the concept of a 'PCI flavor', a named,
user-available grouping of PCI devices.  The PCI flavor specifies one type
of device, however I'm grouping my devices together.

Also, we'd like administrators to have some control over the flavor at the
API level, even if the devices available are not API changeable.  I like to
think of this as the compute nodes reporting their resources, defined by
their configuration, and the API defining the requirements that a VM has,
based on what the administrator makes available (just as they do with Nova
flavors for an instance).

Finally, PCI devices are not all exactly the same, despite appearances.
Network devices can have specific connections; storage devices might be
connected to different SANs or have different devices attached.  You can't
do all your whitelisting and flavor defining using only the vendor and
product ID.


We've been through several design iterations, and where we stand at the
moment is that you can tag up the devices on the compute node with a config
item we've called pci_information, and you then group them using a PCI
flavor defined through the API.

pci_information lets you whitelist PCI devices as before.  This is still
important because you don't want to offer up your disk drives and control
network for users to map into their VMs.  But on top of that you can also
add extra information about PCI devices, generally information that details
things that can't be discovered about those devices but that you know about
when you're installing the machine - for instance, the network a NIC is
connected to.

PCI flavors, independently of the pci_information configuration, describe
which device groups are available to users.  So that would be the 10g
devices, the GPUs and so on.  If you want to change your offerings on the
fly you can do that, subject to the resources that the pci_information is
offering out.  You can select specific device types based on the basic PCI
information and the extra information that you put in the pci_information
configuration, which means you've got some flexibility with your
configuration.


Now we're good so far, but in recent meetings John Garbutt has been making
a strong case that host aggregates solve our problems better, and here's
where I'd like your opinions.

Firstly, they can be used to define the data that pci_information holds.
Instead of putting this in compute node configuration, you can use a host
aggregate with additional key-value information to define what devices
you're after from each compute node in the aggregate.  This will work, but
there are two issues I see with it - firstly, this information is precisely
the sort of information I think belongs in the host config file, the sort
of thing that's readily to hand when you deploy the machine, and secondly a
new machine will not actually have any PCI passthrough devices available
until you add it to an aggregate via the API.

Secondly, they can be used to define PCI flavors, rather than making a new
API object, with new calls and a new database table.  The problem here is
that PCI flavors group devices by attributes - the vendor, the product, or
the attached network - and generally have no relationship to the hosts at
all.  We can do this, but we save ourselves a straightforward object and
table containing well structured data, by abusing a datastructure that's
not even keyed off of a relevant type of information.

Obviously, that's my opinion.  I hope John will give the other side of the
argument  But I'd like to know, has anyone else got any thoughts on the
suggestions here?  We'd like to get this resolved before we get to the
review war stage.
-- 
Ian.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140121/4c7c0362/attachment.html>


More information about the OpenStack-dev mailing list