[openstack-dev] FPGA as a dynamic nested resources
Fei K Chen
uchen at cn.ibm.com
Thu Jul 21 00:44:21 UTC 2016
Roman Dobosz <roman.dobosz at intel.com> wrote on 2016/07/20 02:03:28:
> From: Roman Dobosz <roman.dobosz at intel.com>
> To: openstack-dev <openstack-dev at lists.openstack.org>
> Date: 2016/07/20 02:07
> Subject: [openstack-dev] FPGA as a dynamic nested resources
>
> Hi all,
>
> Some time ago Jay Pipes published etherpad[1] with ideas around
> modelling nested resources, taking NUMA as an example. I was also
> encouraged ;) to start this thread, on last Nova scheduler meeting.
>
> I was read mentioned etherpad and what hits me was that described
> scenario with NUMA cells resembles the way how FPGA can be managed. In
> some extent.
>
> NUMA cell can be treated as a vessel for memory cells, and it is
> expressed as number of MB. So it is possible to extract the
> information from existing data and add another level of aggregation
> using only clever prepared SQL query.
>
> I think, that problem might be broader, than using existing, tweaked a
> bit model. If we take a look into resources, which FPGA may expose,
> than it can be couple of levels, and each of them can be treated as
> resource.
>
> It can identified 3 levels of FPGA resources, which can be nested one
> on the others:
>
> 1. Whole FPGA. If used discrete FPGA, than even today it might be pass
> through to the VM.
>
> 2. Region in FPGA. Some of the FPGA models can be divided into regions
> or slots. Also, for some model it is possible to (re)program such
> region individually - in this case there is a possibility to pass
> entire slot to the VM, so that it might be possible to reprogram
> such slot, and utilize the algorithm within the VM.
>
> 3. Accelerator in region/FPGA. If there is an accelerator programmed
> in the slot, it is possible, that such accelerator provides us with
> Virtual Functions (similar to the SR-IOV), than every available VF
> can be treated as a resource.
>
> 4. It might be also necessary to track every VF individually, although
> I didn't assumed it will be needed, nevertheless with nested
> resources it should be easy to handle it.
You need. For example you have 4 region and 8 VF. Some region is configured
with an accelerator so it can be shared to multi-VM (each consume a VF).
But
some other region is configured with private exclusive accelerator so it
can
only be bind to one VF. That's why we need to track both region and VF.
>
> Correlation between such resources are a bit different from NUMA -
> while in NUMA case there is a possibility to either schedule a VM with
> some memory specified, or request memory within NUMA cell, in FPGA if
> there is slot taken, or accelerator already programmed and used, there
> is no way to offer FPGA as a whole to the tenant, until all
> accelerators and slots are free.
>
> I've followed Jay idea about nested resources and having in mind
> blueprint[2] regarding dynamic resources I've prepared how it fit in.
>
> Tables are unchanged - it is a copy-paste from the etherpad[1]:
>
>
> CREATE TABLE resource_providers (
> id INT NOT NULL AUTOINCREMENT PRIMARY KEY,
> uuid CHAR(36) NOT NULL,
> name VARCHAR(100) NULL,
> root_provider_id INT NULL,
> parent_provider_id INT NULL
> );
>
> CREATE TABLE inventories (
> id INT NOT NULL AUTOINCREMENT PRIMARY KEY,
> resource_provider_id INT NOT NULL,
> resource_class_id INT NOT NULL,
> total INT NOT NULL,
> reserved INT NOT NULL,
> min_unit INT NOT NULL,
> max_unit INT NOT NULL,
> step_size INT NOT NULL,
> allocation_ratio INT NOT NULL
> );
>
> CREATE TABLE allocations (
> id INT NOT NULL AUTOINCREMENT PRIMARY KEY,
> resource_provider_id INT NOT NULL,
> consumer_uuid CHAR(36) NOT NULL,
> resource_class_id INT NOT NULL,
> used INT NOT NULL
> );
>
>
> Than lets fill the tables with data of following structure:
>
> -- FPGA-1
> -- +- FPGA-1 slot1 (taken), resource_provider_id:
> -- +- FPGA-1 slot2
> -- +- FPGA-1 slot2 acceleratorX
> -- +- FPGA-1 slot2 acceleratorX VF1 (taken)
> -- +- FPGA-1 slot2 acceleratorX VF2 (taken)
> -- +- FPGA-1 slot2 acceleratorX VF3 (taken)
> -- +- FPGA-1 slot2 acceleratorX VF4 (taken)
> -- +- FPGA-1 slot2 acceleratorX VF5
> -- +- ..
> -- +- FPGA-1 slot2 acceleratorX VF32
> -- +- FPGA-1 slot3
> -- FPGA-2
> -- +- FPGA-2 slot1
>
> where FPGA-1 and FPGA-2 are hosts with FPGA on board. There is also
> assumed, that new dynamic resources are created: id 1666 means 'FPGA'
> (although it might be simply standard class, which will be hardcoded
> ENUM), 1667 means 'FPGA slot' and 1668 'FPGA accelerator'.
>
>
> INSERT INTO resource_providers VALUES
> (1, '<UUID>', 'FPGA-1', 1, NULL),
> (2, '<UUID>', 'FPGA-1 slot 1', 1, 1),
> (3, '<UUID>', 'FPGA-1 slot 2', 1, 1),
> (4, '<UUID>', 'FPGA-1 slot 3', 1, 1),
> (5, '<UUID>', 'FPGA-1 slot 2 acceleratorX', 1, 3),
> (6, '<UUID>', 'FPGA-2', 6, NULL),
> (7, '<UUID>', 'FPGA-2 slot', 6, 6);
>
>
> INSERT INTO inventories VALUES
> (1, 1, 1666, 1, 0, 1, 1, 1, 1.0),
> (2, 2, 1667, 1, 0, 1, 1, 1, 1.0),
> (3, 3, 1667, 1, 0, 1, 1, 1, 1.0),
> (4, 4, 1667, 1, 0, 1, 1, 1, 1.0),
> (5, 5, 1668, 32, 0, 1, 32, 1, 1.0),
> (6, 6, 1666, 1, 0, 1, 1, 1, 1.0),
> (7, 7, 1667, 1, 0, 1, 1, 1, 1.0);
>
> INSERT INTO allocations VALUES
> (1, 5, '<UUID>', 1668, 4),
> (2, 2, '<UUID>', 1667, 1);
>
>
> To get id of resource of type acceleratorX to allocate 8 VF:
>
>
> SELECT rp.id
> FROM resource_providers rp
> LEFT JOIN allocations al ON al.resource_provider_id = rp.id
> LEFT JOIN inventories iv ON iv.resource_provider_id = rp.id
> WHERE al.resource_class_id = 1668
> AND (iv.total - COALESCE(al.used, 0)) >= 8;
>
>
> Note, that I don't have to calculate number of total available VFs in
> this case, although it might happen, that user might schedule VM which
> requests number of VFs that exceed available VFs in single accelerator,
> than such calculation will be needed.
>
> Getting more VFs than available will not return any records:
>
>
> SELECT rp.id
> FROM resource_providers rp
> LEFT JOIN allocations al ON al.resource_provider_id = rp.id
> LEFT JOIN inventories iv ON iv.resource_provider_id = rp.id
> WHERE al.resource_class_id = 1668
> AND (iv.total - COALESCE(al.used, 0)) >= 29;
>
>
> Nothing fancy here. More interesting cases would be for getting all
> unallocated slots:
>
>
> SELECT rp.id
> FROM resource_providers rp
> LEFT JOIN inventories iv on iv.resource_provider_id = rp.id
> WHERE iv.resource_class_id = 1667
> AND rp.id not in (
> SELECT rp.parent_provider_id as id
> FROM allocations al
> LEFT JOIN inventories iv on al.resource_provider_id =
> iv.resource_provider_id
> LEFT JOIN resource_providers rp on rp.id = iv.resource_provider_id
> WHERE al.resource_class_id = 1668
> UNION
> SELECT iv.resource_provider_id as id
> FROM allocations al
> LEFT JOIN inventories iv on al.resource_provider_id =
> iv.resource_provider_id
> LEFT JOIN resource_providers rp on rp.id = iv.resource_provider_id
> WHERE al.resource_class_id = 1667
> );
>
>
> Or get all unallocated whole FPGA:
>
>
> SELECT rp.id
> FROM resource_providers rp
> LEFT JOIN inventories iv on rp.id = iv.resource_provider_id
> WHERE iv.resource_class_id = 1666
> AND rp.id NOT in (
> SELECT rp.parent_provider_id
> FROM resource_providers rp
> LEFT JOIN inventories iv on iv.resource_provider_id = rp.id
> WHERE iv.resource_class_id = 1667
> AND rp.id in (
> SELECT rp.parent_provider_id as id
> FROM allocations al
> LEFT JOIN inventories iv on al.resource_provider_id =
> iv.resource_provider_id
> LEFT JOIN resource_providers rp on rp.id =
iv.resource_provider_id
> WHERE al.resource_class_id = 1668
> UNION
> SELECT iv.resource_provider_id as id
> FROM allocations al
> LEFT JOIN inventories iv on al.resource_provider_id =
> iv.resource_provider_id
> LEFT JOIN resource_providers rp on rp.id =
iv.resource_provider_id
> WHERE al.resource_class_id = 1667
> )
> );
>
>
> Those two queries are similar, in a fact, that if user request
> slot/whole FPGA, we have to check if there is no accelerator (in use)
> which might occupy slot in case of slot query, and the same check and
> additional for slot usage for querying free FPGA.
>
> There is another topic, which I didn't thought out yet - means
> potentially available resources - that means accelerator/IP which
> might be requested during VM boot, but doesn't exist yet. In a case of
> FPGA, it might be simply brought up by external entity (assumed
> library or service) which will take care about burden for preparing
> such IP accelerator/IP on free slot, and takes care about updating
> information of allocations and dynamic resources.
>
> Thoughts?
>
> --
> Cheers,
> Roman Dobosz
>
>
__________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160721/959891d9/attachment.html>
More information about the OpenStack-dev
mailing list