[openstack-dev] FPGA as a dynamic nested resources

Roman Dobosz roman.dobosz at intel.com
Tue Jul 19 18:03:28 UTC 2016


Hi all,

Some time ago Jay Pipes published etherpad[1] with ideas around
modelling nested resources, taking NUMA as an example. I was also
encouraged ;) to start this thread, on last Nova scheduler meeting.

I was read mentioned etherpad and what hits me was that described
scenario with NUMA cells resembles the way how FPGA can be managed. In
some extent.

NUMA cell can be treated as a vessel for memory cells, and it is
expressed as number of MB. So it is possible to extract the
information from existing data and add another level of aggregation
using only clever prepared SQL query.

I think, that problem might be broader, than using existing, tweaked a
bit model. If we take a look into resources, which FPGA may expose,
than it can be couple of levels, and each of them can be treated as
resource.

It can identified 3 levels of FPGA resources, which can be nested one
on the others:

1. Whole FPGA. If used discrete FPGA, than even today it might be pass
   through to the VM.

2. Region in FPGA. Some of the FPGA models can be divided into regions
   or slots. Also, for some model it is possible to (re)program such
   region individually - in this case there is a possibility to pass
   entire slot to the VM, so that it might be possible to reprogram
   such slot, and utilize the algorithm within the VM.

3. Accelerator in region/FPGA. If there is an accelerator programmed
   in the slot, it is possible, that such accelerator provides us with
   Virtual Functions (similar to the SR-IOV), than every available VF
   can be treated as a resource.

4. It might be also necessary to track every VF individually, although
   I didn't assumed it will be needed, nevertheless with nested
   resources it should be easy to handle it.

Correlation between such resources are a bit different from NUMA -
while in NUMA case there is a possibility to either schedule a VM with
some memory specified, or request memory within NUMA cell, in FPGA if
there is slot taken, or accelerator already programmed and used, there
is no way to offer FPGA as a whole to the tenant, until all
accelerators and slots are free.

I've followed Jay idea about nested resources and having in mind
blueprint[2] regarding dynamic resources I've prepared how it fit in.

Tables are unchanged - it is a copy-paste from the etherpad[1]:


CREATE TABLE resource_providers (
    id INT NOT NULL AUTOINCREMENT PRIMARY KEY,
    uuid CHAR(36) NOT NULL,
    name VARCHAR(100) NULL,
    root_provider_id INT NULL,
    parent_provider_id INT NULL
);

CREATE TABLE inventories (
    id INT NOT NULL AUTOINCREMENT PRIMARY KEY,
    resource_provider_id INT NOT NULL,
    resource_class_id INT NOT NULL,
    total INT NOT NULL,
    reserved INT NOT NULL,
    min_unit INT NOT NULL,
    max_unit INT NOT NULL,
    step_size INT NOT NULL,
    allocation_ratio INT NOT NULL
);

CREATE TABLE allocations (
    id INT NOT NULL AUTOINCREMENT PRIMARY KEY,
    resource_provider_id INT NOT NULL,
    consumer_uuid CHAR(36) NOT NULL,
    resource_class_id INT NOT NULL,
    used INT NOT NULL
);


Than lets fill the tables with data of following structure:

-- FPGA-1
--   +- FPGA-1 slot1 (taken), resource_provider_id:
--   +- FPGA-1 slot2
--       +- FPGA-1 slot2 acceleratorX
--           +- FPGA-1 slot2 acceleratorX VF1 (taken)
--           +- FPGA-1 slot2 acceleratorX VF2 (taken)
--           +- FPGA-1 slot2 acceleratorX VF3 (taken)
--           +- FPGA-1 slot2 acceleratorX VF4 (taken)
--           +- FPGA-1 slot2 acceleratorX VF5
--           +- ..
--           +- FPGA-1 slot2 acceleratorX VF32
--   +- FPGA-1 slot3
-- FPGA-2
--   +- FPGA-2 slot1

where FPGA-1 and FPGA-2 are hosts with FPGA on board. There is also
assumed, that new dynamic resources are created: id 1666 means 'FPGA'
(although it might be simply standard class, which will be hardcoded
ENUM), 1667 means 'FPGA slot' and 1668 'FPGA accelerator'.


INSERT INTO resource_providers VALUES
(1, '<UUID>', 'FPGA-1', 1, NULL),
(2, '<UUID>', 'FPGA-1 slot 1', 1, 1),
(3, '<UUID>', 'FPGA-1 slot 2', 1, 1),
(4, '<UUID>', 'FPGA-1 slot 3', 1, 1),
(5, '<UUID>', 'FPGA-1 slot 2 acceleratorX', 1, 3),
(6, '<UUID>', 'FPGA-2', 6, NULL),
(7, '<UUID>', 'FPGA-2 slot', 6, 6);


INSERT INTO inventories VALUES
(1, 1, 1666, 1, 0, 1, 1, 1, 1.0),
(2, 2, 1667, 1, 0, 1, 1, 1, 1.0),
(3, 3, 1667, 1, 0, 1, 1, 1, 1.0),
(4, 4, 1667, 1, 0, 1, 1, 1, 1.0),
(5, 5, 1668, 32, 0, 1, 32, 1, 1.0),
(6, 6, 1666, 1, 0, 1, 1, 1, 1.0),
(7, 7, 1667, 1, 0, 1, 1, 1, 1.0);

INSERT INTO allocations VALUES
(1, 5, '<UUID>', 1668, 4),
(2, 2, '<UUID>', 1667, 1);


To get id of resource of type acceleratorX to allocate 8 VF:


SELECT rp.id
FROM resource_providers rp
LEFT JOIN allocations al ON al.resource_provider_id = rp.id
LEFT JOIN inventories iv ON iv.resource_provider_id = rp.id
WHERE al.resource_class_id = 1668
AND (iv.total - COALESCE(al.used, 0)) >= 8;


Note, that I don't have to calculate number of total available VFs in
this case, although it might happen, that user might schedule VM which
requests number of VFs that exceed available VFs in single accelerator,
than such calculation will be needed.

Getting more VFs than available will not return any records:


SELECT rp.id
FROM resource_providers rp
LEFT JOIN allocations al ON al.resource_provider_id = rp.id
LEFT JOIN inventories iv ON iv.resource_provider_id = rp.id
WHERE al.resource_class_id = 1668
AND (iv.total - COALESCE(al.used, 0)) >= 29;


Nothing fancy here. More interesting cases would be for getting all
unallocated slots:


SELECT rp.id
FROM resource_providers rp
LEFT JOIN inventories iv on iv.resource_provider_id = rp.id
WHERE iv.resource_class_id = 1667
AND rp.id not in (
    SELECT rp.parent_provider_id as id
    FROM allocations al
    LEFT JOIN inventories iv on al.resource_provider_id  = iv.resource_provider_id
    LEFT JOIN resource_providers rp on rp.id = iv.resource_provider_id
    WHERE al.resource_class_id = 1668
    UNION
    SELECT iv.resource_provider_id as id
    FROM allocations al
    LEFT JOIN inventories iv on al.resource_provider_id  = iv.resource_provider_id
    LEFT JOIN resource_providers rp on rp.id = iv.resource_provider_id
    WHERE al.resource_class_id = 1667
);


Or get all unallocated whole FPGA:


SELECT rp.id
FROM resource_providers rp
LEFT JOIN inventories iv on rp.id = iv.resource_provider_id
WHERE iv.resource_class_id = 1666
AND rp.id NOT in (
    SELECT rp.parent_provider_id
    FROM resource_providers rp
    LEFT JOIN inventories iv on iv.resource_provider_id = rp.id
    WHERE iv.resource_class_id = 1667
    AND rp.id in (
        SELECT rp.parent_provider_id as id
        FROM allocations al
        LEFT JOIN inventories iv on al.resource_provider_id  = iv.resource_provider_id
        LEFT JOIN resource_providers rp on rp.id = iv.resource_provider_id
        WHERE al.resource_class_id = 1668
        UNION
        SELECT iv.resource_provider_id as id
        FROM allocations al
        LEFT JOIN inventories iv on al.resource_provider_id  = iv.resource_provider_id
        LEFT JOIN resource_providers rp on rp.id = iv.resource_provider_id
        WHERE al.resource_class_id = 1667
    )
);


Those two queries are similar, in a fact, that if user request
slot/whole FPGA, we have to check if there is no accelerator (in use)
which might occupy slot in case of slot query, and the same check and
additional for slot usage for querying free FPGA.

There is another topic, which I didn't thought out yet - means
potentially available resources - that means accelerator/IP which
might be requested during VM boot, but doesn't exist yet. In a case of
FPGA, it might be simply brought up by external entity (assumed
library or service) which will take care about burden for preparing
such IP accelerator/IP on free slot, and takes care about updating
information of allocations and dynamic resources.

Thoughts?

-- 
Cheers,
Roman Dobosz



More information about the OpenStack-dev mailing list