[Openstack-operators] PCI passthrough trying to use busy resource?
Jonathan D. Proulx
jon at csail.mit.edu
Tue Oct 18 19:05:46 UTC 2016
Answering my own questions a bti faster than I though I could.
nova DB has a pci_devices table.
what happened was there was in intermediate state where the
pci_passthrough_whitelist value on the hypervisor was
missin. apparently during taht time the row for this hypervisor in the
pci_devices table got marked as deleted. Teh when the nova.conf go
fixed they got recreated (even though the old 'deleted' resources we
really actively in use)
so I end up this colliding state:
> SELECT created_at,deleted_at,deleted,id,compute_node_id,address,status,instance_uuid FROM pci_devices WHERE address='0000:09:00.0';
+---------------------+---------------------+---------+----+-----------------+--------------+-----------+--------------------------------------+
| created_at | deleted_at | deleted | id | compute_node_id | address | status | instance_uuid |
+---------------------+---------------------+---------+----+-----------------+--------------+-----------+--------------------------------------+
| 2016-07-06 00:12:30 | 2016-10-13 21:04:53 | 4 | 4 | 90 | 0000:09:00.0 | allocated | 9269391a-4ce4-4c8d-993d-5ad7a9c3879b |
| 2016-10-18 18:01:35 | NULL | 0 | 12 | 90 | 0000:09:00.0 | available | NULL |
+---------------------+---------------------+---------+----+-----------------+--------------+-----------+--------------------------------------+
since it's only really 3 entries I can fix this by hand then head over
to bug report land.
-Jon
On Tue, Oct 18, 2016 at 02:50:11PM -0400, Jonathan D. Proulx wrote:
:Hi all,
:
:I have a test GPU system that seemed to be working properly under Kilo
:running 1 and 2 GPU instnace types on an 8GPU server.
:
:After Mitaka upgrade it seems to alway try and assing the same Device
:which is alredy in use rather than pick one of the 5 currently
:available.
:
:
: Build of instance 9542cc63-793c-440e-9a57-cc06eb401839 was
: re-scheduled: Requested operation is not valid: PCI device
: 0000:09:00.0 is in use by driver QEMU, domain instance-000abefa
: _do_build_and_run_instance
: /usr/lib/python2.7/dist-packages/nova/compute/manager.py:1945
:
:it tries to schedule 5 times, but each time uses the same busy
:device. Since there are currently 3 in use if it had just picked a
:new one each time
:
:In trying to debug this I realize I have no idea how devices are
:selected. Does OpenStack track which PCI devices are claimed or is
:that a libvirt function and in either case where woudl I look to find
:out what it thinks the current state is?
:
:Thanks,
:-Jon
:--
--
More information about the OpenStack-operators
mailing list