[Openstack] [nova] Database not delete PCI info after device is removed from host and nova.conf
Eddie Yen
missile0407 at gmail.com
Fri Jul 7 02:37:21 UTC 2017
Hi Jay,
Below are few logs and information you may want to check.
I wrote GPU inforamtion into nova.conf like this.
pci_passthrough_whitelist = [{ "product_id":"0ff3", "vendor_id":"10de" }, {
"product_id":"68c8", "vendor_id":"1002" }]
pci_alias = [{ "product_id":"0ff3", "vendor_id":"10de", "device_type":
"type-PCI", "name":"k420" }, { "product_id":"68c8", "vendor_id":"1002",
"device_type":"type-PCI", "name":"v4800" }]
Then restart the services.
nova-compute log when insert new GPU device info into nova.conf and restart
service:
http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/
Strange is, the log shows that resource tracker only collect information of
new setup GPU, not included the old one.
But If I do some actions on the instance contained old GPU, the tracker
will get both GPU.
http://paste.openstack.org/show/614658/
Nova database shows correct information on both GPU
http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/
Now remove ID "1002:68c8" from nova.conf and compute node, and restart
services.
The pci_passthrough_whitelist and pci_alias only keep "10de:0ff3" GPU info.
pci_passthrough_whitelist = { "product_id":"0ff3", "vendor_id":"10de" }
pci_alias = { "product_id":"0ff3", "vendor_id":"10de", "device_type":
"type-PCI", "name":"k420" }
nova-compute log shows resource tracker report node only have "10de:0ff3"
PCI resource
http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/
But in Nova database, "1002:68c8" still exist, and stayed in "Available"
status. Even "deleted" value shows not zero.
http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/
Many thanks,
Eddie.
2017-07-07 9:05 GMT+08:00 Eddie Yen <missile0407 at gmail.com>:
> Uh wait,
>
> Is that possible it still shows available if PCI device still exist in the
> same address?
>
> Because when I remove the GPU card, I replace it to a SFP+ network card in
> the same slot.
> So when I type lspci the SFP+ card stay in the same address.
>
> But it still doesn't make any sense because these two cards definitely not
> a same VID:PID.
> And I set the information as VID:PID in nova.conf
>
>
> I'll try reproduce this issue and put a log on this list.
>
> Thanks,
>
> 2017-07-07 9:01 GMT+08:00 Jay Pipes <jaypipes at gmail.com>:
>
>> Hmm, very odd indeed. Any way you can save the nova-compute logs from
>> when you removed the GPU and restarted the nova-compute service and paste
>> those logs to paste.openstack.org? Would be useful in tracking down this
>> buggy behaviour...
>>
>> Best,
>> -jay
>>
>> On 07/06/2017 08:54 PM, Eddie Yen wrote:
>>
>>> Hi Jay,
>>>
>>> The status of the "removed" GPU still shows as "Available" in
>>> pci_devices table.
>>>
>>> 2017-07-07 8:34 GMT+08:00 Jay Pipes <jaypipes at gmail.com <mailto:
>>> jaypipes at gmail.com>>:
>>>
>>>
>>> Hi again, Eddie :) Answer inline...
>>>
>>> On 07/06/2017 08:14 PM, Eddie Yen wrote:
>>>
>>> Hi everyone,
>>>
>>> I'm using OpenStack Mitaka version (deployed from Fuel 9.2)
>>>
>>> In present, I installed two different model of GPU card.
>>>
>>> And wrote these information into pci_alias and
>>> pci_passthrough_whitelist in nova.conf on Controller and Compute
>>> (the node which installed GPU).
>>> Then restart nova-api, nova-scheduler,and nova-compute.
>>>
>>> When I check database, both of GPU info registered in
>>> pci_devices table.
>>>
>>> Now I removed one of the GPU from compute node, and remove the
>>> information from nova.conf, then restart services.
>>>
>>> But I check database again, the information of the removed card
>>> still exist in pci_devices table.
>>>
>>> How can I do to fix this problem?
>>>
>>>
>>> So, when you removed the GPU from the compute node and restarted the
>>> nova-compute service, it *should* have noticed you had removed the
>>> GPU and marked that PCI device as deleted. At least, according to
>>> this code in the PCI manager:
>>>
>>> https://github.com/openstack/nova/blob/master/nova/pci/manag
>>> er.py#L168-L183
>>> <https://github.com/openstack/nova/blob/master/nova/pci/mana
>>> ger.py#L168-L183>
>>>
>>> Question for you: what is the value of the status field in the
>>> pci_devices table for the GPU that you removed?
>>>
>>> Best,
>>> -jay
>>>
>>> p.s. If you really want to get rid of that device, simply remove
>>> that record from the pci_devices table. But, again, it *should* be
>>> removed automatically...
>>>
>>> _______________________________________________
>>> Mailing list:
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>
>>> Post to : openstack at lists.openstack.org
>>> <mailto:openstack at lists.openstack.org>
>>> Unsubscribe :
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>
>>>
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20170707/9005a769/attachment.html>
More information about the Openstack
mailing list