[openstack-dev] [nova] [neutron] PCI pass-through network support

Jiang, Yunhong yunhong.jiang at intel.com
Mon Jan 13 20:00:57 UTC 2014

Ian, not sure if I get your question. Why should scheduler get the number of flavor types requested? The scheduler will only translate the PCI flavor to the pci property match requirement like it does now, (either vendor_id, device_id, or item in extra_info), then match the translated pci flavor, i.e. pci requests, to the pci stats.


From: Ian Wells [mailto:ijw.ubuntu at cack.org.uk]
Sent: Monday, January 13, 2014 10:57 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

It's worth noting that this makes the scheduling a computationally hard problem. The answer to that in this scheme is to reduce the number of inputs to trivialise the problem.  It's going to be O(f(number of flavor types requested, number of pci_stats pools)) and if you group appropriately there shouldn't be an excessive number of pci_stats pools.  I am not going to stand up and say this makes it achievable - and if it doesn't them I'm not sure that anything would make overlapping flavors achievable - but I think it gives us some hope.

On 13 January 2014 19:27, Jiang, Yunhong <yunhong.jiang at intel.com<mailto:yunhong.jiang at intel.com>> wrote:
Hi, Robert, scheduler keep count based on pci_stats instead of the pci flavor.

As stated by Ian at https://www.mail-archive.com/openstack-dev@lists.openstack.org/msg13455.html already, the flavor will only use the tags used by pci_stats.


From: Robert Li (baoli) [mailto:baoli at cisco.com<mailto:baoli at cisco.com>]
Sent: Monday, January 13, 2014 8:22 AM

To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

As I have responded in the other email, and If I understand PCI flavor correctly, then the issue that we need to deal with is the overlapping issue. A simplest case of this overlapping is that you can define a flavor F1 as [vendor_id='v', product_id='p'], and a flavor F2 as [vendor_id = 'v'] .  Let's assume that only the admin can define the flavors. It's not hard to see that a device can belong to the two different flavors in the same time. This introduces an issue in the scheduler. Suppose the scheduler (counts or stats based) maintains counts based on flavors (or the keys corresponding to the flavors). To request a device with the flavor F1,  counts in F2 needs to be subtracted by one as well. There may be several ways to achieve that. But regardless, it introduces tremendous overhead in terms of system processing and administrative costs.

What are the use cases for that? How practical are those use cases?


On 1/10/14 9:34 PM, "Ian Wells" <ijw.ubuntu at cack.org.uk<mailto:ijw.ubuntu at cack.org.uk>> wrote:

> OK - so if this is good then I think the question is how we could change the 'pci_whitelist' parameter we have - which, as you say, should either *only* do whitelisting or be renamed - to allow us to add information.  Yongli has something along those lines but it's not flexible and it distinguishes poorly between which bits are extra information and which bits are matching expressions (and it's still called pci_whitelist) - but even with those criticisms it's very close to what we're talking about.  When we have that I think a lot of the rest of the arguments should simply resolve themselves.
> [yjiang5_1] The reason that not easy to find a flexible/distinguishable change to pci_whitelist is because it combined two things. So a stupid/naive solution in my head is, change it to VERY generic name, 'pci_devices_information',
> and change schema as an array of {'devices_property'=regex exp, 'group_name' = 'g1'} dictionary, and the device_property expression can be 'address ==xxx, vendor_id == xxx' (i.e. similar with current white list),  and we can squeeze more into the "pci_devices_information" in future, like 'network_information' = xxx or "Neutron specific information" you required in previous mail.

We're getting to the stage that an expression parser would be useful, annoyingly, but if we are going to try and squeeze it into JSON can I suggest:

{ match = { class = "Acme inc. discombobulator" }, info = { group = "we like teh groups", volume = "11" } }

> All keys other than 'device_property' becomes extra information, i.e. software defined property. These extra information will be carried with the PCI devices,. Some implementation details, A)we can limit the acceptable keys, like we only support 'group_name', 'network_id', or we can accept any keys other than reserved (vendor_id, device_id etc) one.

Not sure we have a good list of reserved keys at the moment, and with two dicts it isn't really necessary, I guess.  I would say that we have one match parser which looks something like this:

# does this PCI device match the expression given?
def match(expression, pci_details, extra_specs):
   for (k, v) in expression:
        if k.starts_with('e.'):
           mv = extra_specs.get(k[2:])
           mv = pci_details.get(k[2:])
        if not match(m, mv):
            return False
    return True

Usable in this matching (where 'e.' just won't work) and also for flavor assignment (where e. will indeed match the extra values).

> B) if a device match 'device_property' in several entries, raise exception, or use the first one.

Use the first one, I think.  It's easier, and potentially more useful.

> [yjiang5_1] Another thing need discussed is, as you pointed out, "we would need to add a config param on the control host to decide which flags to group on when doing the stats".  I agree with the design, but some details need decided.

This is a patch that can come at any point after we do the above stuff (which we need for Neutron), clearly.

> Where should it defined. If we a) define it in both control node and compute node, then it should be static defined (just change pool_keys in "/opt/stack/nova/nova/pci/pci_stats.py" to a configuration parameter) . Or b) define only in control node, then I assume the control node should be the scheduler node, and the scheduler manager need save such information, present a API to fetch such information and the compute node need fetch it on every update_available_resource() periodic task. I'd prefer to take a) option in first step. Your idea?

I think it has to be (a), which is a shame.

OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org<mailto:OpenStack-dev at lists.openstack.org>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140113/60dc486a/attachment.html>

More information about the OpenStack-dev mailing list