Hey Folks, I could use a little assistance getting GPU passthrough working. I had this working already for one flavor of nvidia gpu... and I've added some hosts with a much newer gpu... I've updated the pci_alias and pci_passthrough variables and those seem to be getting set properly in nova.conf passthrough_whitelist = [{"vendor_id":"10de", "product_id":"1b06"},{"vendor_id":"10de", "product_id":"26b9"}] alias = {"name": "gpu", "product_id": "1b06", "vendor_id": "10de"} alias = {"name": "gpu-l40s", "product_id": "26b9", "vendor_id": "10de"} I believe I have all of the iommu stuff configured and have the pci- stub module entries... dmesg output shows that the GPUs are being claimed by the stub module. $ openstack flavor show t1.small_gpu_l40s +----------------------------+--------------------------------------+ | Field | Value | +----------------------------+--------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | description | None | | disk | 0 | | id | af70c94e-0026-4a39-bc1e-dfb93b286a54 | | name | t1.small_gpu_l40s | | os-flavor-access:is_public | True | | properties | pci_passthrough:alias='gpu-l40s:1' | | ram | 2048 | | rxtx_factor | 1.0 | | swap | 0 | | vcpus | 1 | +----------------------------+--------------------------------------+ Yet... I can't seem to get an instance to run using that new flavor... keeps complaining about there not being enough hosts available. Fault: code: 500 created: 2024-10-30T02:04:33Z message: "No valid host was found. There are not enough hosts available." details: | Traceback (most recent call last): File "/usr/lib/python3/dist-packages/nova/conductor/manager.py", line 1580, in schedule_and_build_instances host_lists = self._schedule_instances(context, request_specs[0], File "/usr/lib/python3/dist-packages/nova/conductor/manager.py", line 940, in _schedule_instances host_lists = self.query_client.select_destinations( File "/usr/lib/python3/dist- packages/nova/scheduler/client/query.py", line 41, in select_destinations return self.scheduler_rpcapi.select_destinations(context, spec_obj, File "/usr/lib/python3/dist-packages/nova/scheduler/rpcapi.py", line 160, in select_destinations return cctxt.call(ctxt, 'select_destinations', **msg_args) File "/usr/lib/python3/dist- packages/oslo_messaging/rpc/client.py", line 189, in call result = self.transport._send( File "/usr/lib/python3/dist- packages/oslo_messaging/transport.py", line 123, in _send return self._driver.send(target, ctxt, message, File "/usr/lib/python3/dist- packages/oslo_messaging/_drivers/amqpdriver.py", line 689, in send return self._send(target, ctxt, message, wait_for_reply, timeout, File "/usr/lib/python3/dist- packages/oslo_messaging/_drivers/amqpdriver.py", line 681, in _send raise result nova.exception_Remote.NoValidHost_Remote: No valid host was found. There are not enough hosts available. Traceback (most recent call last): File "/usr/lib/python3/dist- packages/oslo_messaging/rpc/server.py", line 241, in inner return func(*args, **kwargs) File "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 223, in select_destinations selections = self._select_destinations( File "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 250, in _select_destinations selections = self._schedule( File "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 416, in _schedule self._ensure_sufficient_hosts( File "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 455, in _ensure_sufficient_hosts raise exception.NoValidHost(reason=reason) nova.exception.NoValidHost: No valid host was found. There are not enough hosts available. Any clues on how/where to dig into this more to see what might be missing? Thanks. -- Andy Speagle