<html><body><p><tt>Roman Dobosz <roman.dobosz@intel.com> wrote on 2016/07/20 02:03:28:<br><br>> From: Roman Dobosz <roman.dobosz@intel.com></tt><br><tt>> To: openstack-dev <openstack-dev@lists.openstack.org></tt><br><tt>> Date: 2016/07/20 02:07</tt><br><tt>> Subject: [openstack-dev] FPGA as a dynamic nested resources</tt><br><tt>> <br>> Hi all,<br>> <br>> Some time ago Jay Pipes published etherpad[1] with ideas around<br>> modelling nested resources, taking NUMA as an example. I was also<br>> encouraged ;) to start this thread, on last Nova scheduler meeting.<br>> <br>> I was read mentioned etherpad and what hits me was that described<br>> scenario with NUMA cells resembles the way how FPGA can be managed. In<br>> some extent.<br>> <br>> NUMA cell can be treated as a vessel for memory cells, and it is<br>> expressed as number of MB. So it is possible to extract the<br>> information from existing data and add another level of aggregation<br>> using only clever prepared SQL query.<br>> <br>> I think, that problem might be broader, than using existing, tweaked a<br>> bit model. If we take a look into resources, which FPGA may expose,<br>> than it can be couple of levels, and each of them can be treated as<br>> resource.<br>> <br>> It can identified 3 levels of FPGA resources, which can be nested one<br>> on the others:<br>> <br>> 1. Whole FPGA. If used discrete FPGA, than even today it might be pass<br>>    through to the VM.<br>> <br>> 2. Region in FPGA. Some of the FPGA models can be divided into regions<br>>    or slots. Also, for some model it is possible to (re)program such<br>>    region individually - in this case there is a possibility to pass<br>>    entire slot to the VM, so that it might be possible to reprogram<br>>    such slot, and utilize the algorithm within the VM.<br>> <br>> 3. Accelerator in region/FPGA. If there is an accelerator programmed<br>>    in the slot, it is possible, that such accelerator provides us with<br>>    Virtual Functions (similar to the SR-IOV), than every available VF<br>>    can be treated as a resource.<br>> <br>> 4. It might be also necessary to track every VF individually, although<br>>    I didn't assumed it will be needed, nevertheless with nested<br>>    resources it should be easy to handle it.</tt><br><tt>You need. For example you have 4 region and 8 VF. Some region is configured</tt><br><tt>with an accelerator so it can be shared to multi-VM (each consume a VF). But</tt><br><tt>some other region is configured with </tt><tt>private </tt><tt>exclusive accelerator so it can</tt><br><tt>only be bind to one VF. That's why we need to track both region and VF.</tt><br><tt><br>> <br>> Correlation between such resources are a bit different from NUMA -<br>> while in NUMA case there is a possibility to either schedule a VM with<br>> some memory specified, or request memory within NUMA cell, in FPGA if<br>> there is slot taken, or accelerator already programmed and used, there<br>> is no way to offer FPGA as a whole to the tenant, until all<br>> accelerators and slots are free.<br>> <br>> I've followed Jay idea about nested resources and having in mind<br>> blueprint[2] regarding dynamic resources I've prepared how it fit in.<br>> <br>> Tables are unchanged - it is a copy-paste from the etherpad[1]:<br>> <br>> <br>> CREATE TABLE resource_providers (<br>>     id INT NOT NULL AUTOINCREMENT PRIMARY KEY,<br>>     uuid CHAR(36) NOT NULL,<br>>     name VARCHAR(100) NULL,<br>>     root_provider_id INT NULL,<br>>     parent_provider_id INT NULL<br>> );<br>> <br>> CREATE TABLE inventories (<br>>     id INT NOT NULL AUTOINCREMENT PRIMARY KEY,<br>>     resource_provider_id INT NOT NULL,<br>>     resource_class_id INT NOT NULL,<br>>     total INT NOT NULL,<br>>     reserved INT NOT NULL,<br>>     min_unit INT NOT NULL,<br>>     max_unit INT NOT NULL,<br>>     step_size INT NOT NULL,<br>>     allocation_ratio INT NOT NULL<br>> );<br>> <br>> CREATE TABLE allocations (<br>>     id INT NOT NULL AUTOINCREMENT PRIMARY KEY,<br>>     resource_provider_id INT NOT NULL,<br>>     consumer_uuid CHAR(36) NOT NULL,<br>>     resource_class_id INT NOT NULL,<br>>     used INT NOT NULL<br>> );<br>> <br>> <br>> Than lets fill the tables with data of following structure:<br>> <br>> -- FPGA-1<br>> --   +- FPGA-1 slot1 (taken), resource_provider_id:<br>> --   +- FPGA-1 slot2<br>> --       +- FPGA-1 slot2 acceleratorX<br>> --           +- FPGA-1 slot2 acceleratorX VF1 (taken)<br>> --           +- FPGA-1 slot2 acceleratorX VF2 (taken)<br>> --           +- FPGA-1 slot2 acceleratorX VF3 (taken)<br>> --           +- FPGA-1 slot2 acceleratorX VF4 (taken)<br>> --           +- FPGA-1 slot2 acceleratorX VF5<br>> --           +- ..<br>> --           +- FPGA-1 slot2 acceleratorX VF32<br>> --   +- FPGA-1 slot3<br>> -- FPGA-2<br>> --   +- FPGA-2 slot1<br>> <br>> where FPGA-1 and FPGA-2 are hosts with FPGA on board. There is also<br>> assumed, that new dynamic resources are created: id 1666 means 'FPGA'<br>> (although it might be simply standard class, which will be hardcoded<br>> ENUM), 1667 means 'FPGA slot' and 1668 'FPGA accelerator'.<br>> <br>> <br>> INSERT INTO resource_providers VALUES<br>> (1, '<UUID>', 'FPGA-1', 1, NULL),<br>> (2, '<UUID>', 'FPGA-1 slot 1', 1, 1),<br>> (3, '<UUID>', 'FPGA-1 slot 2', 1, 1),<br>> (4, '<UUID>', 'FPGA-1 slot 3', 1, 1),<br>> (5, '<UUID>', 'FPGA-1 slot 2 acceleratorX', 1, 3),<br>> (6, '<UUID>', 'FPGA-2', 6, NULL),<br>> (7, '<UUID>', 'FPGA-2 slot', 6, 6);<br>> <br>> <br>> INSERT INTO inventories VALUES<br>> (1, 1, 1666, 1, 0, 1, 1, 1, 1.0),<br>> (2, 2, 1667, 1, 0, 1, 1, 1, 1.0),<br>> (3, 3, 1667, 1, 0, 1, 1, 1, 1.0),<br>> (4, 4, 1667, 1, 0, 1, 1, 1, 1.0),<br>> (5, 5, 1668, 32, 0, 1, 32, 1, 1.0),<br>> (6, 6, 1666, 1, 0, 1, 1, 1, 1.0),<br>> (7, 7, 1667, 1, 0, 1, 1, 1, 1.0);<br>> <br>> INSERT INTO allocations VALUES<br>> (1, 5, '<UUID>', 1668, 4),<br>> (2, 2, '<UUID>', 1667, 1);<br>> <br>> <br>> To get id of resource of type acceleratorX to allocate 8 VF:<br>> <br>> <br>> SELECT rp.id<br>> FROM resource_providers rp<br>> LEFT JOIN allocations al ON al.resource_provider_id = rp.id<br>> LEFT JOIN inventories iv ON iv.resource_provider_id = rp.id<br>> WHERE al.resource_class_id = 1668<br>> AND (iv.total - COALESCE(al.used, 0)) >= 8;<br>> <br>> <br>> Note, that I don't have to calculate number of total available VFs in<br>> this case, although it might happen, that user might schedule VM which<br>> requests number of VFs that exceed available VFs in single accelerator,<br>> than such calculation will be needed.<br>> <br>> Getting more VFs than available will not return any records:<br>> <br>> <br>> SELECT rp.id<br>> FROM resource_providers rp<br>> LEFT JOIN allocations al ON al.resource_provider_id = rp.id<br>> LEFT JOIN inventories iv ON iv.resource_provider_id = rp.id<br>> WHERE al.resource_class_id = 1668<br>> AND (iv.total - COALESCE(al.used, 0)) >= 29;<br>> <br>> <br>> Nothing fancy here. More interesting cases would be for getting all<br>> unallocated slots:<br>> <br>> <br>> SELECT rp.id<br>> FROM resource_providers rp<br>> LEFT JOIN inventories iv on iv.resource_provider_id = rp.id<br>> WHERE iv.resource_class_id = 1667<br>> AND rp.id not in (<br>>     SELECT rp.parent_provider_id as id<br>>     FROM allocations al<br>>     LEFT JOIN inventories iv on al.resource_provider_id  = <br>> iv.resource_provider_id<br>>     LEFT JOIN resource_providers rp on rp.id = iv.resource_provider_id<br>>     WHERE al.resource_class_id = 1668<br>>     UNION<br>>     SELECT iv.resource_provider_id as id<br>>     FROM allocations al<br>>     LEFT JOIN inventories iv on al.resource_provider_id  = <br>> iv.resource_provider_id<br>>     LEFT JOIN resource_providers rp on rp.id = iv.resource_provider_id<br>>     WHERE al.resource_class_id = 1667<br>> );<br>> <br>> <br>> Or get all unallocated whole FPGA:<br>> <br>> <br>> SELECT rp.id<br>> FROM resource_providers rp<br>> LEFT JOIN inventories iv on rp.id = iv.resource_provider_id<br>> WHERE iv.resource_class_id = 1666<br>> AND rp.id NOT in (<br>>     SELECT rp.parent_provider_id<br>>     FROM resource_providers rp<br>>     LEFT JOIN inventories iv on iv.resource_provider_id = rp.id<br>>     WHERE iv.resource_class_id = 1667<br>>     AND rp.id in (<br>>         SELECT rp.parent_provider_id as id<br>>         FROM allocations al<br>>         LEFT JOIN inventories iv on al.resource_provider_id  = <br>> iv.resource_provider_id<br>>         LEFT JOIN resource_providers rp on rp.id = iv.resource_provider_id<br>>         WHERE al.resource_class_id = 1668<br>>         UNION<br>>         SELECT iv.resource_provider_id as id<br>>         FROM allocations al<br>>         LEFT JOIN inventories iv on al.resource_provider_id  = <br>> iv.resource_provider_id<br>>         LEFT JOIN resource_providers rp on rp.id = iv.resource_provider_id<br>>         WHERE al.resource_class_id = 1667<br>>     )<br>> );<br>> <br>> <br>> Those two queries are similar, in a fact, that if user request<br>> slot/whole FPGA, we have to check if there is no accelerator (in use)<br>> which might occupy slot in case of slot query, and the same check and<br>> additional for slot usage for querying free FPGA.<br>> <br>> There is another topic, which I didn't thought out yet - means<br>> potentially available resources - that means accelerator/IP which<br>> might be requested during VM boot, but doesn't exist yet. In a case of<br>> FPGA, it might be simply brought up by external entity (assumed<br>> library or service) which will take care about burden for preparing<br>> such IP accelerator/IP on free slot, and takes care about updating<br>> information of allocations and dynamic resources.<br>> <br>> Thoughts?<br>> <br>> -- <br>> Cheers,<br>> Roman Dobosz<br>> <br>> __________________________________________________________________________<br>> OpenStack Development Mailing List (not for usage questions)<br>> Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe<br>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>> <br></tt><BR>

</body></html>