<div dir="ltr"><div><div>Hi,<br><br></div>We've had a few comments about creating a new table specifically for PCI flavors versus using the already existing host aggregates table, and John Garbutt asked me to explain the concepts involved here to see what the body of opinion was on the subject. My opinion, which which this account is biased towards, is that the PCI flavor serves a specific purpose and host aggregates are not terribly useful for any of the concepts here, but I would like to ensure that whatever we put forward is going to be accepted at review, so please have a read and see if you agree. Apoogies for the essay, but I'm trying to summarise from start to finish; I hope the explanation makes sense if you stick with it.<br>
<br><br>The current situation - the PCI code in Havana - has one concept, a 'PCI whitelist', which - on each compute host - is a config item that describes the available PCI devices that VMs can use, by matching the vendor and product IDs on the card (i.e. what sort of card it is). The Nova flavor then has an additional extra_specs item that is a set of matching expressions and counts. Free matching PCI devices will meet the requirements of the Nova flavor; the VM will be scheduled to wherever we can find a full set of devices.<br>
<br>This has advantages and issues. <br><br>I personally like the fact that the PCI whitelist lives on the compute node - the hardware in the compute node is something very specific to that compute node and doesn't change much, and I think that it's the right approach to store it as config, therefore, rather than in the DB. If a new compute node is added, its config explains what PCI devices are available, regardless of whether it's the same as, or different to, other nodes in the system. Speaking personally, managing these often repetitive configs usually comes down to writing a bit of puppet, so while I do occasionally get them wrong and have to fix them, it's not a massive overhead to roll out new machines.<br>
<br></div><div>The biggest limitation is there are certain things with this scheme you can't represent. Sometimes you want to put your devices into Nova flavors, but sometimes you want to use them in other ways. For instance, I'd like to do:<br>
<br></div><div> nova boot --nic pci-flavor=10g,net-id=XXX ...<br><br></div><div>... where I'm referring to the PCI type directly, not in a Nova flavor.<br><br>For this purpose we came up with the concept of a 'PCI flavor', a named, user-available grouping of PCI devices. The PCI flavor specifies one type of device, however I'm grouping my devices together.<br>
<div><br></div>Also, we'd like administrators to have some control over the flavor at the API level, even if the devices available are not API changeable. I like to think of this as the compute nodes reporting their resources, defined by their configuration, and the API defining the requirements that a VM has, based on what the administrator makes available (just as they do with Nova flavors for an instance).<br>
<br></div><div>Finally, PCI devices are not all exactly the same, despite appearances. Network devices can have specific connections; storage devices might be connected to different SANs or have different devices attached. You can't do all your whitelisting and flavor defining using only the vendor and product ID.<br>
</div><div><br><br></div><div>We've been through several design iterations, and where we stand at the moment is that you can tag up the devices on the compute node with a config item we've called pci_information, and you then group them using a PCI flavor defined through the API.<br>
<br></div><div>pci_information lets you whitelist PCI devices as before. This is still important because you don't want to offer up your disk drives and control network for users to map into their VMs. But on top of that you can also add extra information about PCI devices, generally information that details things that can't be discovered about those devices but that you know about when you're installing the machine - for instance, the network a NIC is connected to.<br>
<br></div><div>PCI flavors, independently of the pci_information configuration, describe which device groups are available to users. So that would be the 10g devices, the GPUs and so on. If you want to change your offerings on the fly you can do that, subject to the resources that the pci_information is offering out. You can select specific device types based on the basic PCI information and the extra information that you put in the pci_information configuration, which means you've got some flexibility with your configuration.<br>
<br><br></div><div>Now we're good so far, but in recent meetings John Garbutt has been making a strong case that host aggregates solve our problems better, and here's where I'd like your opinions.<br></div><div>
<br>Firstly, they can be used to define the data that pci_information holds. Instead of putting this in compute node configuration, you can use a host aggregate with additional key-value information to define what devices you're after from each compute node in the aggregate. This will work, but there are two issues I see with it - firstly, this information is precisely the sort of information I think belongs in the host config file, the sort of thing that's readily to hand when you deploy the machine, and secondly a new machine will not actually have any PCI passthrough devices available until you add it to an aggregate via the API.<br>
<br>Secondly, they can be used to define PCI flavors, rather than making a new API object, with new calls and a new database table. The problem here is that PCI flavors group devices by attributes - the vendor, the product, or the attached network - and generally have no relationship to the hosts at all. We can do this, but we save ourselves a straightforward object and table containing well structured data, by abusing a datastructure that's not even keyed off of a relevant type of information.<br>
<br></div><div>Obviously, that's my opinion. I hope John will give the other side of the argument But I'd like to know, has anyone else got any thoughts on the suggestions here? We'd like to get this resolved before we get to the review war stage.<br>
-- <br></div><div>Ian.<br></div></div>